Speaker
Description
Household survey microdata is a primary input for social science research and public policy evaluation, yet the processing pipelines that turn raw microdata into publishable estimates are rarely documented, shared, or reproduced. Each research team writes ad hoc scripts to recode variables, construct indicators, and compute weighted statistics, duplicating effort and introducing silent inconsistencies across studies.
We present metasurvey, an open-source R package that provides a metaprogramming layer on top of the survey package for reproducible survey data processing. The package introduces three abstractions built on R6 classes: (1) Steps — lazy-evaluated transformations (compute, recode, filter, rename, remove, join, validate) that record their intent before execution; (2) Recipes — reusable, shareable collections of steps bundled with metadata and provenance tracking; and (3) Workflows — estimation routines (svymean,svytotal, svyby, plus convey inequality measures) that produce publication-ready tables with confidence intervals and quality indicators.
metasurvey handles complex sampling designs transparently, including rotating panels with bootstrap replicate weights and pooled multi-edition surveys. A built-in recipe registry with a REST API enables researchers to publish, discover, and reuse each other's processing pipelines. The package also includes a Stata .do file transpiler that converts legacy processing scripts into native metasurvey pipelines, lowering adoption barriers for teams transitioning from Stata.
The package is available on GitHub (github.com/metasurveyr/metasurvey) with over 2,700 tests, 90% code coverage, and comprehensive bilingual documentation. It has been validated against Uruguay's national household survey (ECH) across 40+ editions.
If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.
Claude Code (Anthropic) was used during package development for code auditing, test generation, and bug detection. The abstract was reviewed with AI assistance to improve writing in a non-native language.
Additional Material or Paper
https://github.com/metasurveyr/metasurvey
| Keywords: Please list up to 5 keywords to help us find the right session for your contribution. | survey data, reproducibility, metaprogramming, data pipeline, complex sampling designs |
|---|---|
| Virtual Option | This submission is for onsite presentation primarily, but I would also like it to be considered for pre-recorded virtual presentation if I don't get an onsite slot |
| Video Recording | Video sharing is fine |
| The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. | Confirm |