Use of global positioning system (GPS) technology has become a standard method of data collection in geographical epidemiology and public health studies.
However, in dense urban environments, GPS measurements are highly prone to error as signals bounce off large buildings and other structures.
Formally defined as multipath error, these reflections increase the distance that the GPS signal must travel to reach the receiver—causing miscalculations in position.
Multipath problems can result in locations that seemingly wander or jump as the signal is dropped and regained.
Although the error associated with these split second delays may seem negligible, the multipath phenomena can substantially impact positional accuracy and tracking measurements.
In order to evaluate the associations between urban density and GPS positional accuracy we used data from the New York City (NYC) Taxi and Limousine Commission containing information on the approximately 175 million NYC yellowcab taxi trips that occurred in 2013.
We hypothesized that higher building density would be positively associated with errors in accuracy.
My research group, the Built Environment and Health Project has been using and seeing GPS and accelerometry data emerging in many health-related research studies.
We've noticed a lot of error in areas with high building density. So we've been sending students out on structured walks to better assess how the built environment influences GPS signal.
In two slides, we'll see two walks down two different streets (one with low and one with high built density) in NYC (Bronx and Midtown Manhattan).
Two students walked together each carrying two DG-100 GPS devices and the RunKeeper app on their smartphones.
If researchers were assessing walking or traveled distance by using the raw un-processed GPS data they could potentially overestimate distance traveled in areas of high multipath error.
We've sent students on about 40 different walks stratified by the upper and lower quartiles of distributed building height (explained in later slides) and are planning on putting out some documentation providing recommendations on using GPS data for public health research and how folks should plan on using some GPS cleaning/snapping methods.
The Taxi pickup and dropoff GPS data appeared to have the same error.
So it seemed worthwhile and exciting to investigate it and potentially our REDS group's analysis to guide further research.
A map of all dropoff and pickup GPS points in 2013 (starts of trips in blue and the ends in orange on this map:) New York City taxi trips - Eric Fisher, Mapbox. Note points in water and over where buildings are.
iPython notebook to intersect the Census Blocks and Taxi Pickups and Dropoffs, shapefile to csv (GDAL/OGR), combine tables , and concat master table.
We hypothesized that higher building density (Distributed Building Height) would be positively associated with errors in accuracy.
convert -delay 200 -loop 0 img/png/*.png img/gif/dist_bldg_hght_ani.gif #imagemagick-png to gif code
Calculating Distributed Building Height (referred to as building bulk density in code).
Calculating Distance (arcpy.GenerateNearTable_analysis
) from Pickup/Dropoffs to Nearest Roadbed feature, OGR-GDAL and Pandas IPython Notebooks for Data Munging.
Summary Statistics for Census Blocks
nycb2010_taxi_2013_stats_bldg_cnt_pctcbbldg.csv (4 MB)
Distance to Roadbed data for all 2013 Taxi Pickups/Dropoffs
taxi_2013.csv (18.56 GB - OSX-journaled)
To understand the relationship further, we need to map both variables together.