useR! 2026

Name: useR! 2026
Start: 2026-07-06T08:00:00+02:00
End: 2026-07-09T19:00:00+02:00
Location: No location set

6–9 Jul 2026

Europe/Warsaw timezone

On Balancing Numeric and Categorical Variables for Clustering

9 Jul 2026, 11:30

20m

Aula V (SGH Warsaw School of Economics)

Aula V

SGH Warsaw School of Economics

168

Show room on map

Talks (15-20 minutes) Talks

Gero Szepannek (Stralsund university of Applied Sciences)

The package clustMixType [3] is one of the most popular packages for clustering of mixed-type data. Nonethless, an open issue not only for clustering mixed-type data but also for clustering in general is an appropriate weighting of the variables. In Huang’s original paper [1] as well as in the clustMixType package only heuristics are given for this purpose. In the presentation it will be discussed how the concept of variable importance can be used for cluster analysis [1] and how this can be further used to find an appropriate weighting of the variables. An R implementation will be demonstrated.

[1] Hennig, C. and Murphy, K. (2023). Quantifying Variable Importance in Cluster Analysis. Proc. CLADAG 2023, S.515-518, ISBN: ISBN: 9788891935632.
[2] Huang, Z. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2, 283–304. 1998, https://doi.org/10.1023/A:1009769707641.
[3] Szepannek, G. clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal. 2018. https://doi.org/10.32614/RJ-2018-048

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

No AI tools/services were used.

Keywords: Please list up to 5 keywords to help us find the right session for your contribution.	cluster analysis, mixed-type data
Virtual Option	This submission is for onsite presentation only
Video Recording	Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it.	Confirm

Gero Szepannek (Stralsund university of Applied Sciences)

szepannek_upload.zip

useR! 2026

On Balancing Numeric and Categorical Variables for Clustering

Aula V

SGH Warsaw School of Economics

Speaker

Description

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

Author

Presentation materials

Choose timezone

useR! 2026

Speaker

Description

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

Author

Presentation materials