
Working with GPS data
In this recipe, you will work with GPS data. This kind of data is typically saved in a .gpx
file. You will import a bunch of .gpx
files to PostGIS from RunKeeper, a popular social network for runners.
If you have an account on RunKeeper, you can export your .gpx
files and process them by following the instructions in this recipe. Otherwise, you can use the RunKeeper .gpx
files included in the runkeeper-gpx.zip
file, located in the chp03
directory included in the code bundle available with this book.
You will first create a bash
script for importing the .gpx
files to a PostGIS table, using ogr2ogr
. After the import is completed, you will try to write a couple of SQL queries and test some very useful functions, such as ST_MakeLine
to generate polylines from point geometries, ST_Length
to compute distance, and ST_Intersects
to perform a spatial join operation.
Getting ready
Extract the data/chp03/runkeeper-gpx.zip
file to working/chp03/runkeeper_gpx
. In case you haven't been through Chapter 1, Moving Data In and Out of PostGIS, be sure to have the countries
dataset in the PostGIS database.
How to do it...
First, be sure of the format of the .gpx
files that you need to import to PostGIS. Open one of them and check the file structure—each file must be in the XML format composed of just one <trk>
element that contains just one <trkseg>
element that contains many <trkpt>
elements (the points stored from the runner's GPS device). Import these points to a PostGIS Point
table.
- Create a new schema named
chp03
to store the data for all of the recipes in this chapter using the following command:postgis_cookbook=# create schema chp03;
- Create the
chp03.rk_track_points
table in PostgreSQL by executing the following command lines:postgis_cookbook=# CREATE TABLE chp03.rk_track_points ( fid serial NOT NULL, the_geom geometry(Point,4326), ele double precision, "time" timestamp with time zone, CONSTRAINT activities_pk PRIMARY KEY (fid) );
- Create the following script to import all of the
.gpx
files in thechp03.rk_track_points
table using the GDALogr2ogr
command.The following is the Linux version (name it
working/chp03/import_gpx.sh
):#!/bin/bash for f in `find runkeeper_gpx -name \*.gpx -printf "%f\n"` do echo "Importing gpx file $f to chp03.rk_track_points PostGIS table..." #, ${f%.*}" ogr2ogr -append -update -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" runkeeper_gpx/$f -nln chp03.rk_track_points -sql "SELECT ele, time FROM track_points" done
The following is the Windows version (name it
working/chp03/import_gpx.bat
):@echo off for %%I in (runkeeper_gpx\*.gpx*) do ( echo Importing gpx file %%~nxI to chp03.rk_track_points PostGIS table... ogr2ogr -append -update -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" runkeeper_gpx/%%~nxI -nln chp03.rk_track_points -sql "SELECT ele, time FROM track_points" )
- In Linux, don't forget to assign execution permission to it before running it. Run the following script:
$ chmod 775 import_gpx.sh $ ./import_gpx.sh Importing gpx file 2012-02-26-0930.gpx to chp03.rk_track_points PostGIS table... Importing gpx file 2012-02-29-1235.gpx to chp03.rk_track_points PostGIS table... ... Importing gpx file 2011-04-15-1906.gpx to chp03.rk_track_points PostGIS table...
In Windows, just double-click on the
.bat
file or run it from the command prompt using the following command:> import_gpx.bat
- Now, create a polyline table containing a single runner's track details, using the
ST_MakeLine
function. Assume that on each distinct day the runner had just one training. In this table, you should include the start and end times of the track details as follows:postgis_cookbook=# SELECT ST_MakeLine(the_geom) AS the_geom, run_date::date, MIN(run_time) as start_time, MAX(run_time) as end_time INTO chp03.tracks FROM ( SELECT the_geom, "time"::date as run_date, "time" as run_time FROM chp03.rk_track_points ORDER BY run_time ) AS foo GROUP BY run_date;
- Before querying the created tables, don't forget to add spatial indexes to both of the tables to improve their performance, as follows:
postgis_cookbook=# CREATE INDEX rk_track_points_geom_idx ON chp03.rk_track_points USING gist(the_geom); postgis_cookbook=# CREATE INDEX tracks_geom_idx ON chp03.tracks USING gist(the_geom);
- If you try to open both the spatial tables on a Desktop GIS on any given day, you should see that the points from the
rk_track_points
table compose a single polyline geometry record in thetracks
table, as shown in the following screenshot: - Now, query the
tracks
table to get a report of the total distance run (in km) by the runner for each month. For this purpose, use theST_Length
function, as shown in the following query:postgis_cookbook=# SELECT EXTRACT(year FROM run_date) AS run_year, EXTRACT(MONTH FROM run_date) as run_month, SUM(ST_Length(geography(the_geom)))/1000 AS distanceFROM chp03.tracks GROUP BY run_year, run_month; run_year | run_month | distance ------------+-------------+------------------ 2010 | 5 | 67.9277530981487 ... 2012 | 7 | 38.8795349962323 2012 | 8 | 72.0557697278750 (28 rows)
- Using a spatial join between the
tracks
andcountries
tables and again using theST_Length
function as follows, you will get a report of the distance run (in km) by the runner, per country:postgis_cookbook=# SELECT c.name, SUM(ST_Length(geography(t.the_geom)))/1000 AS run_distance FROM chp03.tracks AS t JOIN chp01.countries AS c ON ST_Intersects(t.the_geom, c.the_geom) GROUP BY c.name ORDER BY run_distance DESC; country_name | run_distance --------------------+------------------ Italy | 2628.78393844143 ... Greece | 18.1060004468414 (4 rows)
How it works...
The .gpx
files store all of the points' details in the WGS 84 spatial reference system; therefore, we created the rk_track_points
table with SRID (4326).
After creating the rk_track_points
table, we imported all of the .gpx
files in the runkeeper_gpx
directory using a bash script. The bash script iterates all of the files with the extension *.gpx
in the runkeeper_gpx
directory. For each of these files, the script runs the ogr2ogr
command, importing the .gpx
files to PostGIS using the GPX GDAL driver (for more details go to http://www.gdal.org/ogr/drv_gpx.html).
In the GDAL's abstraction, a .gpx
file is an OGR data source composed of several layers, as follows:
$ ogrinfo -so 2012-08-29-1930.gpx Had to open data source read-only INFO: Open of '2012-08-29-1930.gpx' using driver `GPX' successful. 1: waypoints (Point) 2: routes (Line String) 3: tracks (Multi Line String) 4: route_points (Point) 5: track_points (Point)
In the .gpx
files (OGR data sources), you have just the tracks
and track_points
layers. As a shortcut, you could have imported just the tracks
layer using ogr2ogr
, but you would need to start using some PostGIS functions from the track_points
layer in order to generate the tracks
layer itself. This is why in the ogr2ogr
section in the bash script, we import the point geometries from the track_
points layer, plus a couple of useful attributes, such as elevation and timestamp, to the rk_track_points
PostGIS table.
Once the records were imported, we fed a new polylines table named tracks
using a subquery and select all of the point geometries and their dates and times from the rk_track_points
table, grouped by date, and with the geometries aggregated using the ST_MakeLine
function. This function was able to create linestrings from point geometries (for more details, go to http://www.postgis.org/docs/ST_MakeLine.html).
You should not forget to sort the points in the subquery by datetime
; otherwise, you will obtain an irregular linestring, jumping from one point to the other and not following the correct order.
After loading the tracks
table, we tested the two spatial queries.
At first, you got a month-by-month report of the total distance run by the runner. For this purpose, you selected all of the track records grouped by date (year and month), with the total distance obtained by summing up the lengths of the single tracks (obtained with the ST_Length
function). To get the year and the month from the run_date
function, you used the PostgreSQL EXTRACT
function; be aware that if you measure the distance using geometries in the WGS 84 system, you will obtain it in degree units. For this reason, you have to project the geometries to a planar metric system designed for the specific region from where the data will be projected.
For large-scale areas, such as in our case where we have points that span all around Europe, as shown in the last query results, a good option is to use the geography
data type introduced with PostGIS 1.5. The calculations may be slower, but they are much more accurate than in other systems. This is the reason why you casted the geometries to the geography
data type before making measures.
The last spatial query used a spatial join with the ST_Intersects
function to get the name of the country where each track was run by the runner (with the assumption that the runner didn't run cross-border tracks). To get the total distance run per country is just a matter of aggregating the selection on the country_name
field and aggregating the track distances with the PostgreSQL SUM
operator.