Log on / register
BioMed Central home | Journals A-Z | Feedback | Support | My details
Open AccessResearch

Accuracy of commercial geocoding: assessment and implications

Eric A Whitsel1 email, P Miguel Quibrera2 email, Richard L Smith3 email, Diane J Catellier4 email, Duanping Liao5 email, Amanda C Henley6 email and Gerardo Heiss2 email

Departments of Epidemiology and Medicine, University of North Carolina, Cardiovascular Disease Program, Bank of America Center Suite 306, 137 East Franklin Street, Chapel Hill, NC 27514, USA

Department of Epidemiology, University of North Carolina, Cardiovascular Disease Program, Bank of America Center Suite 306, 137 East Franklin Street, Chapel Hill, NC 27514, USA

Department of Statistics and Operations Research, University of North Carolina, 201 Smith Building 128, Chapel Hill, NC 27599, USA

Department of Biostatistics, University of North Carolina, Collaborative Studies Coordinating Center, 137 East Franklin Street, Chapel Hill, NC 27514, USA

Department of Health Evaluation Sciences, Pennsylvania State University College of Medicine, 600 Centerview Drive Suite 2200, A210, Hershey, PA 17033, USA

Walter Royal Davis Library, University of North Carolina, Reference Department, Geographic Information Services, Chapel Hill, NC 27599, USA

author email corresponding author email

Epidemiologic Perspectives & Innovations 2006, 3:8doi:10.1186/1742-5573-3-8

Published: 20 July 2006

Abstract

Background

Published studies of geocoding accuracy often focus on a single geographic area, address source or vendor, do not adjust accuracy measures for address characteristics, and do not examine effects of inaccuracy on exposure measures. We addressed these issues in a Women's Health Initiative ancillary study, the Environmental Epidemiology of Arrhythmogenesis in WHI.

Results

Addresses in 49 U.S. states (n = 3,615) with established coordinates were geocoded by four vendors (A-D). There were important differences among vendors in address match rate (98%; 82%; 81%; 30%), concordance between established and vendor-assigned census tracts (85%; 88%; 87%; 98%) and distance between established and vendor-assigned coordinates (mean ρ [meters]: 1809; 748; 704; 228). Mean ρ was lowest among street-matched, complete, zip-coded, unedited and urban addresses, and addresses with North American Datum of 1983 or World Geodetic System of 1984 coordinates. In mixed models restricted to vendors with minimally acceptable match rates (A-C) and adjusted for address characteristics, within-address correlation, and among-vendor heteroscedasticity of ρ, differences in mean ρ were small for street-type matches (280; 268; 275), i.e. likely to bias results relying on them about equally for most applications. In contrast, differences between centroid-type matches were substantial in some vendor contrasts, but not others (5497; 4303; 4210) pinteraction < 10-4, i.e. more likely to bias results differently in many applications. The adjusted odds of an address match was higher for vendor A versus C (odds ratio = 66, 95% confidence interval: 47, 93), but not B versus C (OR = 1.1, 95% CI: 0.9, 1.3). That of census tract concordance was no higher for vendor A versus C (OR = 1.0, 95% CI: 0.9, 1.2) or B versus C (OR = 1.1, 95% CI: 0.9, 1.3). Misclassification of a related exposure measure – distance to the nearest highway – increased with mean ρ and in the absence of confounding, non-differential misclassification of this distance biased its hypothetical association with coronary heart disease mortality toward the null.

Conclusion

Geocoding error depends on measures used to evaluate it, address characteristics and vendor. Vendor selection presents a trade-off between potential for missing data and error in estimating spatially defined attributes. Informed selection is needed to control the trade-off and adjust analyses for its effects.


© 1999-2010 BioMed Central Ltd unless otherwise stated. Part of Springer Science+Business Media.