PostGIS Cookbook
上QQ阅读APP看书,第一时间看更新

Working with GPS data

In this recipe, you will work with GPS data. This kind of data is typically saved in a .gpx file. You will import a bunch of .gpx files to PostGIS from RunKeeper, a popular social network for runners.

If you have an account on RunKeeper, you can export your .gpx files and process them by following the instructions in this recipe. Otherwise, you can use the RunKeeper .gpx files included in the runkeeper-gpx.zip file, located in the chp03 directory included in the code bundle available with this book.

You will first create a bash script for importing the .gpx files to a PostGIS table, using ogr2ogr. After the import is completed, you will try to write a couple of SQL queries and test some very useful functions, such as ST_MakeLine to generate polylines from point geometries, ST_Length to compute distance, and ST_Intersects to perform a spatial join operation.

Getting ready

Extract the data/chp03/runkeeper-gpx.zip file to working/chp03/runkeeper_gpx. In case you haven't been through Chapter 1, Moving Data In and Out of PostGIS, be sure to have the countries dataset in the PostGIS database.

How to do it...

First, be sure of the format of the .gpx files that you need to import to PostGIS. Open one of them and check the file structure—each file must be in the XML format composed of just one <trk> element that contains just one <trkseg> element that contains many <trkpt> elements (the points stored from the runner's GPS device). Import these points to a PostGIS Point table.

  1. Create a new schema named chp03 to store the data for all of the recipes in this chapter using the following command:
    postgis_cookbook=# create schema chp03;
    
  2. Create the chp03.rk_track_points table in PostgreSQL by executing the following command lines:
    postgis_cookbook=# CREATE TABLE chp03.rk_track_points
    (
     fid serial NOT NULL,
     the_geom geometry(Point,4326),
     ele double precision,
     "time" timestamp with time zone,
     CONSTRAINT activities_pk PRIMARY KEY (fid)
    );
    
  3. Create the following script to import all of the .gpx files in the chp03.rk_track_points table using the GDAL ogr2ogr command.

    The following is the Linux version (name it working/chp03/import_gpx.sh):

    #!/bin/bash
    for f in `find runkeeper_gpx -name \*.gpx -printf "%f\n"`
    do
        echo "Importing gpx file $f to chp03.rk_track_points PostGIS table..." #, ${f%.*}"
        ogr2ogr -append -update  -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" runkeeper_gpx/$f -nln chp03.rk_track_points -sql "SELECT ele, time FROM track_points"
    done

    The following is the Windows version (name it working/chp03/import_gpx.bat):

    @echo off
    for %%I in (runkeeper_gpx\*.gpx*) do (
        echo Importing gpx file %%~nxI to chp03.rk_track_points PostGIS table...
        ogr2ogr -append -update -f PostgreSQL PG:"dbname='postgis_cookbook' user='me' password='mypassword'" runkeeper_gpx/%%~nxI -nln chp03.rk_track_points -sql "SELECT ele, time FROM track_points"
    )
  4. In Linux, don't forget to assign execution permission to it before running it. Run the following script:
    $ chmod 775 import_gpx.sh
    $ ./import_gpx.sh
    Importing gpx file 2012-02-26-0930.gpx to chp03.rk_track_points PostGIS table...
    Importing gpx file 2012-02-29-1235.gpx to chp03.rk_track_points PostGIS table...
    ...
    Importing gpx file 2011-04-15-1906.gpx to chp03.rk_track_points PostGIS table...
    

    In Windows, just double-click on the .bat file or run it from the command prompt using the following command:

    > import_gpx.bat
    
  5. Now, create a polyline table containing a single runner's track details, using the ST_MakeLine function. Assume that on each distinct day the runner had just one training. In this table, you should include the start and end times of the track details as follows:
    postgis_cookbook=# SELECT
    ST_MakeLine(the_geom) AS the_geom,
     run_date::date,
     MIN(run_time) as start_time,
     MAX(run_time) as end_time
     INTO chp03.tracks
     FROM (
     SELECT the_geom,
     "time"::date as run_date,
     "time" as run_time
     FROM chp03.rk_track_points
     ORDER BY run_time
     ) AS foo GROUP BY run_date;
    
  6. Before querying the created tables, don't forget to add spatial indexes to both of the tables to improve their performance, as follows:
    postgis_cookbook=# CREATE INDEX rk_track_points_geom_idx ON chp03.rk_track_points USING gist(the_geom);
    postgis_cookbook=# CREATE INDEX tracks_geom_idx ON chp03.tracks USING gist(the_geom);
    
  7. If you try to open both the spatial tables on a Desktop GIS on any given day, you should see that the points from the rk_track_points table compose a single polyline geometry record in the tracks table, as shown in the following screenshot:
    How to do it...
  8. Now, query the tracks table to get a report of the total distance run (in km) by the runner for each month. For this purpose, use the ST_Length function, as shown in the following query:
    postgis_cookbook=# SELECT
     EXTRACT(year FROM run_date) AS run_year,
     EXTRACT(MONTH FROM run_date) as run_month,
     SUM(ST_Length(geography(the_geom)))/1000 AS distanceFROM chp03.tracks
    GROUP BY run_year, run_month;
     run_year | run_month | distance 
    ------------+-------------+------------------
     2010 | 5 | 67.9277530981487 
     ...
     2012 | 7 | 38.8795349962323 
     2012 | 8 | 72.0557697278750 
    
    (28 rows)
    
  9. Using a spatial join between the tracks and countries tables and again using the ST_Length function as follows, you will get a report of the distance run (in km) by the runner, per country:
    postgis_cookbook=# SELECT
     c.name,
     SUM(ST_Length(geography(t.the_geom)))/1000 AS run_distance
    FROM chp03.tracks AS t
    JOIN chp01.countries AS c
    ON ST_Intersects(t.the_geom, c.the_geom)
    GROUP BY c.name
    ORDER BY run_distance DESC;
     country_name | run_distance 
    --------------------+------------------
     Italy | 2628.78393844143 
     ...
     Greece | 18.1060004468414 
    
    (4 rows)
    

How it works...

The .gpx files store all of the points' details in the WGS 84 spatial reference system; therefore, we created the rk_track_points table with SRID (4326).

After creating the rk_track_points table, we imported all of the .gpx files in the runkeeper_gpx directory using a bash script. The bash script iterates all of the files with the extension *.gpx in the runkeeper_gpx directory. For each of these files, the script runs the ogr2ogr command, importing the .gpx files to PostGIS using the GPX GDAL driver (for more details go to http://www.gdal.org/ogr/drv_gpx.html).

In the GDAL's abstraction, a .gpx file is an OGR data source composed of several layers, as follows:

$ ogrinfo -so 2012-08-29-1930.gpx
Had to open data source read-only
INFO: Open of '2012-08-29-1930.gpx'
 using driver `GPX' successful.
1: waypoints (Point)
2: routes (Line String)
3: tracks (Multi Line String)
4: route_points (Point)
5: track_points (Point)

In the .gpx files (OGR data sources), you have just the tracks and track_points layers. As a shortcut, you could have imported just the tracks layer using ogr2ogr, but you would need to start using some PostGIS functions from the track_points layer in order to generate the tracks layer itself. This is why in the ogr2ogr section in the bash script, we import the point geometries from the track_ points layer, plus a couple of useful attributes, such as elevation and timestamp, to the rk_track_points PostGIS table.

Once the records were imported, we fed a new polylines table named tracks using a subquery and select all of the point geometries and their dates and times from the rk_track_points table, grouped by date, and with the geometries aggregated using the ST_MakeLine function. This function was able to create linestrings from point geometries (for more details, go to http://www.postgis.org/docs/ST_MakeLine.html).

You should not forget to sort the points in the subquery by datetime; otherwise, you will obtain an irregular linestring, jumping from one point to the other and not following the correct order.

After loading the tracks table, we tested the two spatial queries.

At first, you got a month-by-month report of the total distance run by the runner. For this purpose, you selected all of the track records grouped by date (year and month), with the total distance obtained by summing up the lengths of the single tracks (obtained with the ST_Length function). To get the year and the month from the run_date function, you used the PostgreSQL EXTRACT function; be aware that if you measure the distance using geometries in the WGS 84 system, you will obtain it in degree units. For this reason, you have to project the geometries to a planar metric system designed for the specific region from where the data will be projected.

For large-scale areas, such as in our case where we have points that span all around Europe, as shown in the last query results, a good option is to use the geography data type introduced with PostGIS 1.5. The calculations may be slower, but they are much more accurate than in other systems. This is the reason why you casted the geometries to the geography data type before making measures.

The last spatial query used a spatial join with the ST_Intersects function to get the name of the country where each track was run by the runner (with the assumption that the runner didn't run cross-border tracks). To get the total distance run per country is just a matter of aggregating the selection on the country_name field and aggregating the track distances with the PostgreSQL SUM operator.