useR! 2026

Name: useR! 2026
Start: 2026-07-06T08:00:00+02:00
End: 2026-07-09T19:00:00+02:00
Location: No location set

6–9 Jul 2026

Europe/Warsaw timezone

Evaluating Disclosure Risk in Synthetic Data with the R Package riskutility

7 Jul 2026, 11:30

20m

Aula VI (SGH Warsaw School of Economics)

Aula VI

SGH Warsaw School of Economics

180

Show room on map

Talks (15-20 minutes) Talks

Oscar Thees (FHWN / SwissAnon)

The growing movement toward open data, open science, and open government has increased the demand for sharing detailed microdata while protecting individual privacy. Data anonymization methods, including classical statistical disclosure control techniques and synthetic data generation, enable data sharing, but evaluating the resulting privacy risks and analytical usefulness remains challenging.

We present riskutility, an R package that provides a unified framework for assessing both disclosure risk and data utility in anonymized datasets.

The package implements a broad collection of evaluation metrics, including attribution-based disclosure risk measures (e.g., CAP, TCAP, DCAP), distance-based memorization checks such as Distance to Closest Record (DCR) and Nearest Neighbor Distance Ratio (NNDR), information-theoretic metrics, and model-based and distribution-based utility measures. In addition to these core methods, the package includes many further diagnostics for comparing distributions, multivariate structure, predictive performance, and other aspects of analytical validity. Together, these tools allow analysts to systematically evaluate privacy risks alongside analytical usefulness within a single workflow.

A central methodological contribution implemented in the package is RAPID (Risk of Attribute Prediction–Induced Disclosure), a novel inferential disclosure risk measure. RAPID models a realistic attacker who trains predictive models on released data and uses quasi-identifiers to infer sensitive attributes of real individuals. The method quantifies per-record vulnerability for both continuous and categorical sensitive variables.

Through live coding examples with a real-world dataset, this talk will demonstrate practical workflows for disclosure risk analysis, including identifying high-risk records, selecting attacker models via cross-validation, analysing which quasi-identifier combinations drive vulnerability, and exploring privacy–utility trade-offs using PCA-based visualisations.

The RAPID methodology is described in our recent work:
https://arxiv.org/abs/2602.09235

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

Deepl Write was used for minor language editing and stylistic improvements of the abstract. All scientific content and ideas are the authors’ own.

Additional Material or Paper

We previously contributed a workshop on data anonymisation and disclosure risk at useR! 2024 in Salzburg:
https://userconf2024.sched.com/event/1c8zq/tutorial-data-anonymisation-for-open-science-jiri-novak-oscar-thees-uzh-fhnw-marko-miletic-bern-university-of-applied-sciences-alzbeta-beranova-czech-statistical-office

The RAPID preprint is available at https://arxiv.org/pdf/2602.09235.

Keywords: Please list up to 5 keywords to help us find the right session for your contribution.	synthetic data, statistical disclosure control, disclosure risk, privacy–utility trade-offs, R package
Virtual Option	This submission is for onsite presentation only
Video Recording	Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it.	Confirm
Interested in serving as reviewer?	oscar.thees@fhnw.ch

Oscar Thees (FHWN / SwissAnon)

Prof. Matthias Templ (FHWN / SwissAnon) Mr Roman Müller (FHWN / SwissAnon)

rapid_useR2026.pdf

useR! 2026

Evaluating Disclosure Risk in Synthetic Data with the R Package riskutility

Aula VI

SGH Warsaw School of Economics

Speaker

Description

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

Additional Material or Paper

Author

Co-authors

Presentation materials

Choose timezone

useR! 2026

Speaker

Description

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

Additional Material or Paper

Author

Co-authors

Presentation materials