6–9 Jul 2026
Europe/Warsaw timezone

Evaluating imputation strategies for longitudinal cohort studies

7 Jul 2026, 16:00
5m
Lightning Talk (5 minutes) Epidemiology Lightning Talks

Speaker

Dr Sinead Moylett (University of Limerick)

Description

Missing data is a structural feature of longitudinal cohort studies and can bias inference when attrition is systematic. We present an R-based evaluation of imputation strategies for ordinal outcomes in the Irish Longitudinal Study on Ageing (TILDA) across five waves. All data processing, simulation, imputation, and evaluation were implemented in R, enabling a fully reproducible workflow for longitudinal missing-data experiments. From a cohort of 8,504 respondents, we construct wave-specific ground-truth datasets using stratified samples of 1,000 respondents per wave (5,000 records total). Realistic Missing at Random patterns are introduced via weighted amputation with predictors of non-response estimated using penalised regression in R. We compare a deterministic mode baseline, Random Forest imputation using missForest-style iterative procedures under three mtry configurations, and Multiple Imputation by Chained Equations (MICE) with proportional-odds models. Mode imputation achieves the lowest RMSE (0.994) but introduces bias (0.079) and distributional distortion (KL 0.0293). MICE yields minimal bias (-0.036) and the strongest distributional preservation (KL 0.0009) with RMSE 1.121. Among Random Forest methods, bagged RF is most stable (RMSE 1.009), while optimised and naive configurations show higher RMSE (1.106 and 1.213) and can fail under some waves (RMSE 1.392 versus 0.894 for bagged RF in one instance). These results highlight trade-offs between point accuracy and population-level validity when imputing ordinal longitudinal health data and demonstrate the utility of R for building reproducible simulation pipelines that integrate multiple imputation and machine-learning methods in longitudinal settings.

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

ChatGBT for editing and review of style

Keywords: Please list up to 5 keywords to help us find the right session for your contribution. missing data, multiple imputation, random forest, longitudinal
Virtual Option This submission is for onsite presentation only
Video Recording Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. Confirm

Author

Dr Sinead Moylett (University of Limerick)

Co-authors

Prof. Blair Robertson (University of Canterbury) George Smith-Kolff (University of Canterbury)

Presentation materials

There are no materials yet.