| ResearchAccuracy of commercial geocoding: assessment and implicationsEric A Whitsel1 , P Miguel Quibrera2 , Richard L Smith3 , Diane J Catellier4 , Duanping Liao5 , Amanda C Henley6 and Gerardo Heiss2  1
Departments of Epidemiology and Medicine, University of North Carolina, Cardiovascular Disease Program, Bank of America Center Suite 306, 137 East Franklin Street, Chapel Hill, NC 27514, USA 2
Department of Epidemiology, University of North Carolina, Cardiovascular Disease Program, Bank of America Center Suite 306, 137 East Franklin Street, Chapel Hill, NC 27514, USA 3
Department of Statistics and Operations Research, University of North Carolina, 201 Smith Building 128, Chapel Hill, NC 27599, USA 4
Department of Biostatistics, University of North Carolina, Collaborative Studies Coordinating Center, 137 East Franklin Street, Chapel Hill, NC 27514, USA 5
Department of Health Evaluation Sciences, Pennsylvania State University College of Medicine, 600 Centerview Drive Suite 2200, A210, Hershey, PA 17033, USA 6
Walter Royal Davis Library, University of North Carolina, Reference Department, Geographic Information Services, Chapel Hill, NC 27599, USA author email corresponding author email
Epidemiologic Perspectives & Innovations 2006,
3:8doi:10.1186/1742-5573-3-8 Abstract
Background
Published studies of geocoding accuracy often focus on a single geographic area, address source or vendor, do not adjust accuracy measures for address characteristics, and do not examine effects of inaccuracy on exposure measures. We addressed these issues in a Women's Health Initiative ancillary study, the Environmental Epidemiology of Arrhythmogenesis in WHI.
Results
Addresses in 49 U.S. states (n = 3,615) with established coordinates were geocoded by four vendors (A-D). There were important differences among vendors in address match rate (98%; 82%; 81%; 30%), concordance between established and vendor-assigned census tracts (85%; 88%; 87%; 98%) and distance between established and vendor-assigned coordinates (mean ρ [meters]: 1809; 748; 704; 228). Mean ρ was lowest among street-matched, complete, zip-coded, unedited and urban addresses, and addresses with North American Datum of 1983 or World Geodetic System of 1984 coordinates. In mixed models restricted to vendors with minimally acceptable match rates (A-C) and adjusted for address characteristics, within-address correlation, and among-vendor heteroscedasticity of ρ, differences in mean ρ were small for street-type matches (280; 268; 275), i.e. likely to bias results relying on them about equally for most applications. In contrast, differences between centroid-type matches were substantial in some vendor contrasts, but not others (5497; 4303; 4210) pinteraction < 10-4, i.e. more likely to bias results differently in many applications. The adjusted odds of an address match was higher for vendor A versus C (odds ratio = 66, 95% confidence interval: 47, 93), but not B versus C (OR = 1.1, 95% CI: 0.9, 1.3). That of census tract concordance was no higher for vendor A versus C (OR = 1.0, 95% CI: 0.9, 1.2) or B versus C (OR = 1.1, 95% CI: 0.9, 1.3). Misclassification of a related exposure measure – distance to the nearest highway – increased with mean ρ and in the absence of confounding, non-differential misclassification of this distance biased its hypothetical association with coronary heart disease mortality toward the null.
Conclusion
Geocoding error depends on measures used to evaluate it, address characteristics and vendor. Vendor selection presents a trade-off between potential for missing data and error in estimating spatially defined attributes. Informed selection is needed to control the trade-off and adjust analyses for its effects. |