Speaker
Description
Handling missing values in heterogeneous datasets remains a common challenge in applied data analysis. We introduce vimpute(), a new function in the R package VIM that provides a unified and model-adaptive framework for multivariate imputation. The function implements an iterative, variable-wise imputation procedure that adapts the modelling strategy to the type and characteristics of each variable, while allowing users to flexibly specify the imputation models.
vimpute() is designed as a flexible and extensible interface built on the mlr3 ecosystem, enabling the use of a wide range of modern learning algorithms, including Random Forests, XGBoost, regularized models, and robust regression methods. Numerical variables can optionally be imputed using predictive mean matching with configurable neighborhood sizes and matching strategies. This design allows users to combine model-based imputation with machine-learning methods within a single workflow.
The imputation procedure follows a sequential updating scheme with convergence diagnostics to assess stability. Optional hyperparameter tuning can be performed via mlr3, allowing automated model configuration while maintaining transparent and reproducible workflows.
Simulation experiments and applications to real-world datasets demonstrate that vimpute() provides robust and statistically valid imputations across diverse data settings. By combining methodological flexibility with tight integration into the modern R machine-learning ecosystem, vimpute() supports reproducible and scalable missing-data workflows for applied research, official statistics, and data science applications.
If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.
ChatGPT and Copilot: Assisted with language editing, stylistic improvements, and abstract optimization for clarity, conciseness, and adherence to academic standards.
| Keywords: Please list up to 5 keywords to help us find the right session for your contribution. | missing data, impuatation, machine learning, sequential |
|---|---|
| Virtual Option | This submission is for onsite presentation primarily, but I would also like it to be considered for pre-recorded virtual presentation if I don't get an onsite slot |
| Video Recording | Video sharing is fine |
| The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. | Confirm |