Speaker
Description
Gaussian Mixture Modeling (GMM) is a one of unsupervised techniques used in many fields of data analysis, such as bioinformatics, pattern recognition, and network traffic analysis. Yet, existing R implementations often lack support for binned data (commonly observed in image analysis) and suffer from initialization instability or massive memory usage. To address these limitations, the novel R package dpGMM was introduced. It is specifically designed for robust decomposition of 1D and 2D data, supporting both continuous and binned encodings. The core implementation utilizes the recursive Expectation-Maximization (EM) algorithm, accompanied by a deterministic dynamic programming (DP) initialization strategy based on the Bellman recurrence. For 1D data, the DP algorithm partitions the histogram to accurately estimate initial weights, means, and variances. For 2D data, the implementation leverages boundary distributions to generate aggregated components, which are then iteratively pruned by maximizing the log-likelihood.
To ensure algorithmic stability during EM iterations, dpGMM implements a "pin component" avoidance mechanism. This feature introduces a user-controlled minimum-variance parameter to prevent divergence typically caused by isolated or repeated measurements. Architecturally, the package operates as a flexible wrapper, facilitating both fixed-component fitting and automated model selection via information criteria (e.g., AIC, BIC) paired with an early-stopping Likelihood Ratio Test. Furthermore, the post-processing module includes a hybrid threshold estimation approach that prioritizes Bayes-optimal decision boundaries based on quadratic equations, systematically falling back to the Maximum a Posteriori rule when the discriminant is negative. Performed a broad benchmarking process confirms these specific implementational advancements provide superior parameter estimation and computational efficiency compared to known packages such as mixtools, ClusterR, and mclust. The presented dpGMM is available on CRAN as well as GitHub service under a link https://github.com/ZAEDPolSl/dpGMM
Acknowledgments: This work has been supported by the Silesian University of Technology grant for maintaining and developing research potential for 2026.
Additional Material or Paper
https://www.sciencedirect.com/science/article/pii/S1877750326000293
https://github.com/ZAEDPolSl/dpGMM
If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.
No AI tools/services were used.
| Keywords: Please list up to 5 keywords to help us find the right session for your contribution. | unsupervised learning, gaussian mixture modeling, package development, wrapper solution |
|---|---|
| Virtual Option | This submission is for onsite presentation only |
| Material License | Yes |
| Video Recording | Video sharing is fine |
| The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. | Confirm |