Speaker
Description
We present a synthetic data generator for simulation studies in clustering and partition comparison. The generator creates datasets with controlled cluster structures and predefined similarity levels between alternative partitions, enabling systematic analysis of clustering algorithms' stability.
The framework uses a Gaussian mixture distribution and generates data through a three-stage procedure that introduces latent cluster structures. First, observations are partitioned into mixture components across multiple latent profiles with controlled similarity. Partition similarity is regulated using the Adjusted Rand Index (ARI), allowing precise control over agreement between cluster structures. Second, the geometric properties of mixture components are specified in the feature space. Component means are sampled, and the separation between components is controlled via Mahalanobis distance, to manage cluster overlap and clustering difficulty. Finally, observations are generated by sampling from multivariate normal distributions for each component.
The generator offers flexible control over key characteristics: number of observations, features, mixture components, latent profiles, component mixing proportions, and noise levels. This approach enables controlled simulation experiments for evaluating clustering and biclustering methods. By providing precise control over data characteristics and partition structures, it supports reproducible benchmarking under well-defined conditions.
Additional Material or Paper
https://github.com/karowid617/genMPGMM
If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.
No AI tools/services were used.
| Keywords: Please list up to 5 keywords to help us find the right session for your contribution. | data partition, clustering, synthetic data, data generator, GMM |
|---|---|
| Virtual Option | This submission is for onsite presentation only |
| Video Recording | Video sharing is fine |
| The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. | Confirm |