6–9 Jul 2026
Europe/Warsaw timezone

Imputation methods and their benchmarking for fuzzy datasets with R

8 Jul 2026, 14:20
5m
Lightning Talk (5 minutes) Lightning Talks

Speaker

Maciej Romaniuk (Systems Research Institute, PAS)

Description

Imputation methods are widely used to replace missing values in datasets, thereby improving the overall quality of samples and enabling further statistical procedures. Various measures and tools were proposed to compare the effectiveness and results of the imputation algorithms. However, they only aim at “crisp” (i.e., real-valued) datasets. Meanwhile, fuzzy sets are widely used to model imprecision in data (e.g., when the results of an experiment cannot be precisely described, qualified, or measured) in fields such as biology, engineering, and reliability. The package FuzzyImputationTest is devoted to imputing and benchmarking the special case of datasets consisting of fuzzy numbers. This library is a unique combination of classical tools and new measures that address specific features of fuzzy sets (such as the existence of a membership function, inequalities related to the position of the core and support, etc.). Apart from various measures of “similarity” between the actual and imputed values, statistical tools, including special epistemic bootstrap tests, are also applied. With the help of FuzzyImputationTest, five imputation methods (the widely known missForest, miceRanger, knn, and pmm algorithms, together with the completely new method designed explicitly for fuzzy data - dimp) are numerically compared across various synthetic, real-life, single- and multivariate datasets. The obtained conclusions shed new light on the existing, yet still overlooked, problem of imputing missing fuzzy data.

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

No

Additional Material or Paper

https://github.com/mroman-ibs/FuzzyImputationTest

Keywords: Please list up to 5 keywords to help us find the right session for your contribution. missing data, fuzzy sets, random forests, knn method, numerical comparisons
Virtual Option This submission is for onsite presentation primarily, but I would also like it to be considered for pre-recorded virtual presentation if I don't get an onsite slot
Video Recording Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. Confirm

Author

Maciej Romaniuk (Systems Research Institute, PAS)

Presentation materials