PostGIS Cookbook
上QQ阅读APP看书,第一时间看更新

Using polygon overlays for proportional census estimates

PostgreSQL functions abound for the aggregation of tabular data, including sum, count, min, max, and so on. PostGIS as a framework does not explicitly have spatial equivalents of these, but this does not prevent us from building functions using the aggregates in concert with PostGIS's spatial functionality.

In this recipe, we will explore spatial summarization with the United States Census data. US Census data, by nature, is aggregated data. This is done intentionally to protect the privacy of citizens. But when it comes to doing analyses with this data, the aggregate nature of the data can become problematic. There are some tricks to disaggregate data. Amongst the simplest of these is the use of a proportional sum based on area, which we will do in this exercise.

Getting ready

The problem at hand is that a proposed trail has been drawn in order to provide services for the public. This example could apply to road construction or even finding sites for commercial properties for the purpose of provisioning services.

First, perform a quick data load using the following commands:

shp2pgsql -s 3734 -d -i -I -W LATIN1 -g the_geom census chp02.trail_census | psql -U me -d postgis_cookbook
shp2pgsql -s 3734 -d -i -I -W LATIN1 -g the_geom trail_alignment_proposed_buffer chp02.trail_buffer | psql -U me -d postgis_cookb
ook
shp2pgsql -s 3734 -d -i -I -W LATIN1 -g the_geom trail_alignment_proposed chp02.trail_alignment_prop | psql -U me -d postgis_cookbook

The preceding commands will produce the following output:

Getting ready

In our case, we want to know the population within 1 mile of the trail, assuming that people living within 1 mile of the trail are the ones most likely to use it and, thus, most likely to be served by it.

To find the population near this proposed trail, we overlay census block group population density information. Illustrated in the next screenshot is a 1 mile buffer around the proposed trail overlayed on census information:

Getting ready

One of the things we might note about this census data is the wide range of census densities and census block group size. An approach to calculate the population would be to simply select all census clocks that intersect our area, as shown in the following screenshot:

Getting ready

This is a simple procedure that gives us an estimate of 130,288 people living within 1 mile of the trail. But, looking at the shape of the selection, we can see that we are over-estimating the population by taking the entirety of the block groups in our estimate.

Similarly, if we just used the block groups whose centroids lay within 1 mile of our proposed trail alignment, we would underestimate the population.

Instead, we will make some useful assumptions. Block groups are designed to be moderately homogenous within the block group. Assuming that this holds true for our data, we can assume that, for a given block group, if 50 percent of the block group is within our target area, we can attribute half of the population of that block group to our estimate. Apply this to all our block groups, sum them, and we have a refined estimate that is likely to be better than pure intersects or centroid queries. Thus, we employ a proportional sum.

How to do it...

As the problem of a proportional sum is a generic problem, we will write the underlying proportioning as a function. A function takes inputs and returns a value. In our case, we want our proportioning function to take two geometries, that is, the geometry of our buffered trail and block groups as well as the value we want proportioned, and we want it to return the proportioned value.

CREATE OR REPLACE FUNCTION chp02.proportional_sum(geometry, geometry, numeric)
 RETURNS numeric AS
$BODY$
-- SQL here
$BODY$
 LANGUAGE sql VOLATILE;

Now for the purpose of our calculation, for any given intersection of buffered area and block group, we want to find the proportion that the intersection is over the overall block group. Then this value should be multiplied by the value we want to scale.

In SQL, the function looks like the following query:

SELECT $3 * areacalc FROM
 (
 SELECT (ST_Area(ST_Intersection($1, $2)) / ST_Area($2))::numeric AS areacalc
 ) AS areac
;

The preceding query in its full form looks as follows:

CREATE OR REPLACE FUNCTION chp02.proportional_sum(geometry, geometry, numeric)
 RETURNS numeric AS
$BODY$
 SELECT $3 * areacalc FROM
 (
 SELECT (ST_Area(ST_Intersection($1, $2))/ST_Area($2))::numeric AS areacalc
 ) AS areac
;
$BODY$
 LANGUAGE sql VOLATILE;

How it works...

Since we have written the query as a function, the query uses the SELECT statement to loop through all available records and give us a proportioned population. An astute reader will note that we have not yet done any work on summarization; we have only worked on the proportionality portion of the problem. We can do the summarization upon calling the function using PostgreSQL's built-in aggregate functions. What is neat about this approach is that we need not just apply a sum, but could also calculate other aggregates such as min or max. In the following example, we will just apply a sum:

SELECT ROUND(SUM(chp02.proportional_sum(a.the_geom, b.the_geom, b.pop))) FROM
 chp02.trail_buffer AS a, chp02.trail_census as b
 WHERE ST_Intersects(a.the_geom, b.the_geom)
 GROUP BY a.gid;

The value returned is quite different (population of 96,081), which is more likely to be accurate.