Speaker
Description
Topological data analysis (TDA) is an emerging area of statistical research grounded in topology, intersecting with exploratory analysis, statistical inference, and machine learning. It is therefore important for R users to have access to comprehensive and reliable TDA tools.
Published R packages for TDA fall into three categories: First, {TDA} and {rgudhi} interface with comprehensive libraries in lower-level languages (GUDHI, Dionysus, PHAT). While they provide essential tools, they are not designed to integrate with common R workflows and are made fragile by the need to adapt to upgrades in both the source language and R. Second, {TDAstats}, {TDAkit}, {TDApplied}, and {GSSTDA} provide bespoke toolkits for inference, machine learning, and survival analysis. These tend to be designed for specialists, self-contained rather than modular, and syntactically inharmonious with common workflows—likely hindering adoption by non-specialists. Third, packages like {simplextree}, {interplex}, {ripserr}, and {tdaunif} each perform a narrow task. These have been developed by Dr. Brunson and colleagues with the goal of building a general-purpose, native, modular, and extensible R package collection for TDA. However, they are not yet as comprehensive or interoperable as envisioned.
We present developments accomplished thanks to an ISC grant from the R Consortium during the last year. Our goal was to integrate popular TDA techniques into common statistical workflows in R, with a special focus on the Tidymodels framework (Kuhn and Wickham 2020) for machine learning (ML) tasks.
ML relies heavily on vectorizations, and the last decade of TDA research has produced several for topological data. We developed {tdarec} to bring TDA features into tidymodels workflows. We also published {phutil} which, in the spirit of {tibble}, proposes a data structure to host persistence data and functions to compute distances between persistence diagrams.
TDA also relies on statistical inference. A variety of hypothesis tests have been proposed in the literature with idiosyncratic implementations. We published a first version of {inphr} for that purpose, aiming for future compatibility with tidymodels' {infer} package. Finally, we published an upgraded version of {ripserr} that makes the Ripser library, which efficiently computes Vietoris–Rips filtrations, available in R.
If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.
No AI tools/services were used.
Additional Material or Paper
tdaverse GitHub organization: https://github.com/tdaverse, https://cran.r-project.org/web/packages/phutil/index.html, https://cran.r-project.org/web/packages/tdarec/index.html, https://cran.r-project.org/web/packages/inphr/index.html, https://cran.r-project.org/web/packages/ripserr/index.html
| Keywords: Please list up to 5 keywords to help us find the right session for your contribution. | topological data analysis, machine learning, inference, tidymodels |
|---|---|
| Virtual Option | This submission is for onsite presentation only |
| Video Recording | Video sharing is fine |
| The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. | Confirm |
| Interested in serving as reviewer? | aymeric.stamm@cnrs.fr |