Bryan E Shepherd's Website

Data Errors Research

Methods to Address Measurement Error

This page contains many of our papers in a variety of areas for addressing data quality. This work involves many fabulous students and colleagues. My co-PI on much of this work, Pam Shaw, also has a website that may be helpful.

Analysis Methods

Shepherd BE, Yu C. Accounting for data errors discovered from an audit in multiple linear regression. Biometrics 2011; 67: 1083-1091. link

Shepherd BE, Shaw PA, Dodd LE. Using audit information to adjust parameter estimates for data errors in clinical trials. Clinical Trials 2012; 9: 721-729. link

Oh EJ, Shepherd BE, Lumley T, Shaw PA. Considerations for analysis of time-to-event outcomes measured with error: bias and correction with SIMEX. Statistics in Medicine 2018; 37: 1276-1289. link

Giganti MJ, Shaw PA, Chen G, Bebawy SS, Turner MM, Sterling TR, Shepherd BE. Accounting for dependent errors in predictors and time-to-event outcomes using electronic health record, validation samples, and multiple imputation. Annals of Applied Statistics 2020; 14: 1045-1061. link

Shaw PA, He J, Shepherd BE. Regression calibration to correct correlated errors in outcome and exposure. Statistics in Medicine 2021; 40: 271-286. link

Tao R, Lotspeich SC, Amorim G, Shaw PA, Shepherd BE. Efficient semiparametric inference for two-phase studies with outcome and covariate measurement errors. Statistics in Medicine 2021; 40: 725-738. link

Oh EJ, Shepherd BE, Lumley T, Shaw PA. Raking and regression calibration: methods to address bias from correlated covariate and time-to-event error. Statistics in Medicine 2021; 40: 631-649. link

Han K, Lumley T, Shepherd BE, Shaw PA. Two-phase analysis and study design for survival models with error-prone exposures. Statistical Methods in Medical Research 2021; 30: 857-874. link

Oh EJ, Shepherd BE, Lumley T, Shaw PA. Improved generalized raking estimators to address dependent covariate and failure-time outcome error. Biometrical Journal 2021; 63: 1006-1027. link

Lotspeich SC, Shepherd BE, Amorim G, Shaw PA, Tao R. Efficient odds ratio estimation under two-phase sampling using error-prone data from a multi-national HIV research cohort. Biometrics 2022; 78: 1674-1685. link

Amorim G, Tao R, Lotspeich SC, Shaw PA, Lumley T, Patel RC, Shepherd BE. Three-phase generalized raking and multiple imputation estimators to address error-prone data. Statistics in Medicine 2024; 43: 379-394. link

Amorim G, Tao R, Lumley T, Shaw PA, Shepherd BE. Ascertainment conditional maximum likelihood for continuous outcome under two-phase response-selective design. Statistics in Medicine (in press).

Designs

Amorim G, Tao R, Lotspeich S, Shaw PA, Lumley T, Shepherd BE. Two-phase sampling designs for data validation in settings with covariate measurement error and continuous outcome. Journal of the Royal Statistical Society, Series A 2021; 184: 1368-1389. link

Lotspeich SC, Amorim G, Shaw PA, Tao R, Shepherd BE. Optimal multi-wave validation of secondary use data with outcome and exposure misclassification. The Canadian Journal of Statistics 2024; 52: 532-554. link

Applications

Shepherd BE, Han K, Chen T, Bian A, Pugh SK, Duda SN, Lumley T, Heerman WJ, Shaw PA. Multi-wave validation sampling for error-prone electronic health records. Biometrics 2023; 79: 2649-2663. link

Audit Findings

Duda SN, Shepherd BE, Gadd CS, Masys DR, McGowan CC. Measuring the quality of observational study data in an international HIV research network. PLoS ONE 2012; 7: e33908. link

Giganti MJ, Shepherd BE, Caro-Vega Y, Luz PM, Rebeiro PF, Maia M, Julmiste G, Cortes C, McGowan CC, Duda SN. The impact of data quality and source data verification on epidemiologic inference: a practical application using HIV observational data. BMC Public Health 2019; 19: 1748. link

Lotspeich S, Giganti M, Maia M, Vieira R, Machado D, de Menezes Succi R, Ribeiro S, Pereira M, Rodriguez M, Julmiste G, Luque M, Caro-Vega Y, Mejia F, Shepherd BE, McGowan CC, Duda S. Self-audits as alternatives to travel-audits for improving data quality in the Caribbean, Central and South America network for HIV epidemiology. Journal of Clinical and Translational Science 2019; 4: 125-132. link

Reviews and Software

Shepherd BE, Shaw PA. Errors in multiple variables in HIV cohort and electronic health record data: statistical challenges and opportunities. Statistical Communications in Infectious Diseases 2020; 12: 20190015. link

Shepherd BE, Shaw PA. New methods to improve data accuracy in studies using electronic health record data. Washington (DC): Patient-Centered Outcomes Research Institute (PCORI); 2022 August. PCORI Final Research Reports. link

Yang JB, Shepherd BE, Lumley T, Shaw PA. Optimum allocation for adaptive multi-wave sampling in R: the R package ‘optimall’. Journal of Statistical Software (in press). link software