6–9 Jul 2026
Europe/Warsaw timezone

automatedRecLin: Record Linkage Based on an Entropy-Maximizing Classifier

8 Jul 2026, 11:55
5m
Lightning Talk (5 minutes) Lightning Talks

Speaker

Adam Struzik (Adam Mickiewicz University in Poznań)

Description

In this paper, we present the automatedRecLin R package (available on CRAN), designed to perform record linkage based on an entropy-maximizing classifier. First, we briefly introduce the maximum entropy classification algorithm for record linkage, originally proposed by Lee et al. (2022, Surv. Methodol.), and describe an extension that allows for the use of continuous comparison functions. In a simulation study, we demonstrate that the package's methods yield low error rates and satisfactory estimates of the number of matches. Then, we show how to use the automatedRecLin package in both supervised and unsupervised settings. We focus on switching between different estimation methods, incorporating custom comparison functions, and selecting approaches for creating a set of predicted matches. We conclude with a case study demonstrating a complete workflow for record linkage with the automatedRecLin package.

Additional Material or Paper

For the paper "Capturing Small Discrepancies in Record Linkage: A Maximum Entropy Framework with Continuous Similarity Measures", Adam Struzik received an award in the Student Paper Competition organized by the American Statistical Association’s Statistical Computing and Graphics Sections. The preprint can be found here: https://drive.google.com/file/d/1ZTHVB_p0slF7C-b2eAdIbtBMP2G-jPAY/view?usp=sharing.

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

No AI tools/services were used.

Keywords: Please list up to 5 keywords to help us find the right session for your contribution. Record linkage, Entity resolution, Maximum entropy classification, Continuous comparison functions, Density ratio
Virtual Option This submission is for onsite presentation primarily, but I would also like it to be considered for pre-recorded virtual presentation if I don't get an onsite slot
Video Recording Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. Confirm

Author

Adam Struzik (Adam Mickiewicz University in Poznań)

Co-author

Maciej Beręsewicz (Poznań University of Economics and Business)

Presentation materials

There are no materials yet.