6–9 Jul 2026
Europe/Warsaw timezone

On Balancing Numeric and Categorical Variables for Clustering

9 Jul 2026, 11:10
20m
Talks (15-20 minutes) Talks

Speaker

Gero Szepannek (Stralsund university of Applied Sciences)

Description

The package clustMixType [3] is one of the most popular packages for clustering of mixed-type data. Nonethless, an open issue not only for clustering mixed-type data but also for clustering in general is an appropriate weighting of the variables. In Huang’s original paper [1] as well as in the clustMixType package only heuristics are given for this purpose. In the presentation it will be discussed how the concept of variable importance can be used for cluster analysis [1] and how this can be further used to find an appropriate weighting of the variables. An R implementation will be demonstrated.

[1] Hennig, C. and Murphy, K. (2023). Quantifying Variable Importance in Cluster Analysis. Proc. CLADAG 2023, S.515-518, ISBN: ISBN: 9788891935632.
[2] Huang, Z. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2, 283–304. 1998, https://doi.org/10.1023/A:1009769707641.
[3] Szepannek, G. clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal. 2018. https://doi.org/10.32614/RJ-2018-048

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

No AI tools/services were used.

Keywords: Please list up to 5 keywords to help us find the right session for your contribution. cluster analysis, mixed-type data
Virtual Option This submission is for onsite presentation only
Video Recording Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. Confirm

Author

Gero Szepannek (Stralsund university of Applied Sciences)

Presentation materials

There are no materials yet.