Speaker
Description
The package clustMixType [3] is one of the most popular packages for clustering of mixed-type data. Nonethless, an open issue not only for clustering mixed-type data but also for clustering in general is an appropriate weighting of the variables. In Huang’s original paper [1] as well as in the clustMixType package only heuristics are given for this purpose. In the presentation it will be discussed how the concept of variable importance can be used for cluster analysis [1] and how this can be further used to find an appropriate weighting of the variables. An R implementation will be demonstrated.
[1] Hennig, C. and Murphy, K. (2023). Quantifying Variable Importance in Cluster Analysis. Proc. CLADAG 2023, S.515-518, ISBN: ISBN: 9788891935632.
[2] Huang, Z. Extensions to the k-Means Algorithm for Clustering Large Data Sets with Categorical Values. Data Mining and Knowledge Discovery 2, 283–304. 1998, https://doi.org/10.1023/A:1009769707641.
[3] Szepannek, G. clustMixType: User-Friendly Clustering of Mixed-Type Data in R. The R Journal. 2018. https://doi.org/10.32614/RJ-2018-048
If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.
No AI tools/services were used.
| Keywords: Please list up to 5 keywords to help us find the right session for your contribution. | cluster analysis, mixed-type data |
|---|---|
| Virtual Option | This submission is for onsite presentation only |
| Video Recording | Video sharing is fine |
| The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. | Confirm |