-
Max Marchi (Cleveland Guardians)07/07/2026, 10:30Case studies and applicationsTalks (15-20 minutes)
LeBron choosing between taking a mid-range shot and feeding a teammate for an open three-pointer; Verstappen heading towards the pit lane or driving another few laps with the current tires; Sinner going all-in on a second serve against Alcaraz. Making split-second decisions when the stakes are the highest requires the talent and hard work that only great athletes have. Analysis of sports data...
Go to contribution page -
Antonio Grosso07/07/2026, 10:30Talks (15-20 minutes)
In today's digital landscape, the demand for making statistics more accessible and meaningful to the general public is increasing. This has led to the need for fast and efficient ways to disseminate statistical information, particularly when data visualisations must be produced quickly and updated frequently. In this context, Eurostat launched a project under the ESS Innovation Agenda and...
Go to contribution page -
Edoardo Mancini (Roche)07/07/2026, 10:30Talks (15-20 minutes)
Youโve built a great R package. People are using it. Feature completeness is in sight. Congratulations - youโve defied the odds. Now the hard part begins!
Transitioning an open-source R package from active development to long-term maintenance and stability is a complex shift. Throughout this talk weโll explore methods to tackle this often overlooked, but key challenge of the open-source...
Go to contribution page -
Kristian Vepsรคlรคinen (freelance data scientist)07/07/2026, 10:30Talks (15-20 minutes)
At the 1980 Winter Olympics, the menโs 15 km cross-country skiing race was decided by one hundredth of a second. Rather than debating historical causes, we treat this as a modeling problem: could a physically plausible effectโsuch as a small change in aerodynamic dragโhave been large enough to matter?
In this talk, we demonstrate how Bayesian forward simulation in R can be used to reason...
Go to contribution page -
Lluรญs Revilla Sancho07/07/2026, 10:50Talks (15-20 minutes)
Publishing a package on CRAN is often half the work of a maintainer, then comes the hardest part: maintaining it there. Many resources focus on getting the package published on CRAN and what it takes one maintainer to do so. They share common problems and how to solve them but they are not based on data or focused on maintaining the package on CRAN.
Recently, CRAN has open up some...
Go to contribution page -
Maximilian Mรผcke (LMU Munich)07/07/2026, 10:50Talks (15-20 minutes)
mlr3forecast extends the mlr3 ecosystem to support time series forecasting workflows. It introduces a dedicated forecasting task class and resampling strategies that respect temporal ordering, enabling forecasting models to be benchmarked, tuned, and combined in a systematic way. Learners wrapping established forecasting methods such as ARIMA and ETS can be used alongside any mlr3 learner...
Go to contribution page -
Dr Nicolas Flores Castillo (BHP, Organisational Development and Analytics)07/07/2026, 10:50Talks (15-20 minutes)
Traditional workforce analytics relies on aggregate metrics that obscure individual-level dynamics and wash out critical correlations between employee characteristics, career trajectories, and compensation outcomes. This talk presents a Workforce Digital Twin methodology using individual-level stochastic simulation to model strategic gender pay equity interventions in a large global...
Go to contribution page -
Wolfgang Viechtbauer (Maastricht University)07/07/2026, 10:50Talks (15-20 minutes)
Interactive graphical applications in R are typically built using external web-based frameworks such as Shiny or by interfacing with other programming languages such as Tcl/Tk or JavaScript. However, these approaches may require server infrastructure or knowledge of additional programming languages.
As an alternative, the base R function
Go to contribution pagegrDevices::getGraphicsEvent()enables interactive... -
Tomasz Woลบniak (University of Melbourne)07/07/2026, 11:10Talks (15-20 minutes)
The R package bsvars provides a wide range of tools for empirical macroeconomic and financial analyses using Bayesian Structural Vector Autoregressions. It uses frontier econometric techniques and C++ code to ensure fast and efficient estimation of these multivariate dynamic structural models, possibly with many variables, complex identification strategies, and non-linear...
Go to contribution page -
Cynthia Huang (LMU Munich)07/07/2026, 11:10Talks (15-20 minutes)
Wrapping 'ggplot2' code into plot helper functions is a common way to make multiple versions of a custom plot without
Go to contribution page
copying and pasting the same code over and over again. Helper functions can replace long and complex 'ggplot2' code
chunks with just a single function call. However, if that single function is not designed carefully, the initial convenience can
often turn into frustration.... -
Dr Ahmadou Dicko07/07/2026, 11:10Talks (15-20 minutes)
In methodology reports and academic textbooks, sampling designs are described in a structured way, with an explicit logic connecting strata, clusters, sampling stages, and selection probabilities. This structure is often lost in the R code that implements it, where the design gets scattered across intermediate objects, successive calls, and technical details. The more complex the plan, the...
Go to contribution page -
Cristina Faricelli (ISTAT)07/07/2026, 11:10Statistical models and methodsTalks (15-20 minutes)
UnitMix is an R package designed to detect and correct unit of measurement errors using Gaussian mixture model-based clustering, supporting both methodological research and production workflows at National Statistical Institutes (NSIs).
Go to contribution page
The core function, assign.cluster, implements a multivariate Gaussian mixture model on log-transformed variables, allowing clusters to be defined... -
Oscar Thees (FHWN / SwissAnon)07/07/2026, 11:30Talks (15-20 minutes)
The growing movement toward open data, open science, and open government has increased the demand for sharing detailed microdata while protecting individual privacy. Data anonymization methods, including classical statistical disclosure control techniques and synthetic data generation, enable data sharing, but evaluating the resulting privacy risks and analytical usefulness remains...
Go to contribution page -
Simon Urbanek07/07/2026, 11:30Talks (15-20 minutes)
The R ecosystem provides very good infrastructure for making R accessible as web-based documents, dashboards and for performing remote analyses. This opens up the possibilities for using interactive graphics which are important both for exploratory data analysis and presentation. However, most such solutions consists of simply binding exisiting JavaScript libraries which are often designed for...
Go to contribution page -
Szymon Maksymiuk (WUT)07/07/2026, 11:30Talks (15-20 minutes)
With over 23,000 packages, CRAN forms a large ecosystem of reviewed code. Additionally, its strict rules and dependency structure make it great for analyzing bug propagation in an ecosystem.
This talk presents a hypothetical scenario in which I introduce a bug into base R. I analyze the consequences for the entire ecosystem and whether packages themselves would catch those bugs with their...
Go to contribution page -
Marco Colombo (University of Heidelberg)07/07/2026, 13:00Talks (15-20 minutes)
Fuzz testing is a technique that generates random, unexpected or malformed
data and feeds them into a program. By observing how the software behaves
under such conditions, this approach may uncover potential weaknesses,
vulnerabilities and security flaws in applications.A dynamically-typed language such as R poses some difficulties to fuzz testing,
Go to contribution page
as by definition no predefined typing... -
Dr Pierre Donat-Bouillud (Czech Technical University in Prague)07/07/2026, 13:20Talks (15-20 minutes)
The standard metric for testing and is widely used in R packages, for instance with
testthatortinytest. However, coverage can often be misleading: 100% coverage does not mean the absence of bugs, and line covered with a test does not mean that all behaviours flowing through this line are effectively verified.This presentation introduces MutatoR, a new package designed to bring...
Go to contribution page -
Szymon Maksymiuk (Roche), Lorenzo Braschi (Roche)07/07/2026, 13:40Talks (15-20 minutes)
Software validation is a key part of analyses conducted in a regulatory environment, and it is crucial to document the accuracy, reproducibility, and traceability of the software used for clinical analyses. At Roche, we initially outsourced this effort to an external partner. However, this was a solution far from ideal due to, among others, bottlenecks, a fixed release schedule, and...
Go to contribution page -
Magnus Mengelbier (Independent / Freelancer)07/07/2026, 14:00Talks (15-20 minutes)
There are many validation approaches for R and its packages that has a foundation in classic software and application validation, specifically how it relates to statistical analysis applications within regulated industries like Life Science. The introduction of risk-based validation approaches in the last decade has provided additional tools, but as we now approach R as a language, packages as...
Go to contribution page -
Tim Appelhans (Friedrich-Alexander Universitรคt Erlangen-Nรผrnberg, Institut fรผr Geographie, Wetterkreuz 15, 91058 Erlangen, Germany)08/07/2026, 10:30Talks (15-20 minutes)
When it comes to web-mapping in R, RStudioโs (Positโs) 'leaflet' package has established itself as the de-facto standard. However, in addition to the lack of support for 3D data, 'leaflet' struggles with the rendering of large data (i.e. more than 1 million points/vertices) and data transfer from R memory to the browser can be slow. To overcome these bottlenecks, two packages, 'geoarrowWidget'...
Go to contribution page -
Aleksi Lahtinen (University of Turku)08/07/2026, 10:30Analysis best practices and workflowsTalks (15-20 minutes)
In social science research, datasets are often confidential, requiring analyses to be conducted in secure remote access environments. One such environment is FIONA, which provides researchers with access to sensitive, unit-level Finnish register data alongside standard statistical software, including R. We use FIONA to analyse extensive register data on Finnish teenagers and young adults, with...
Go to contribution page -
Kurt Hornik08/07/2026, 10:30Talks (15-20 minutes)
Starting with R 4.6-0, the stats package provides infrastructure for distribution-free model-based inference in possibly stratified K-sample oneway layouts via the novel free1way model function. Treatment effects to be estimated using free1way include odds- and hazard ratios, Lehmann parameters, and a generalised version of Cohen's d for at least ordered and possibly right-censored...
Go to contribution page -
Dr Lluรญs Revilla Sancho (Roche, Spain)08/07/2026, 10:30Talks (15-20 minutes)
In data-intensive industries like pharmaceuticals, the ability to move seamlessly from population-level summaries to individual data points is critical for valid insight generation. However, enabling faster, more efficient exploratory analysis and regulatory delivery of clinical trials insights is challenging and difficult to scale. This talk introduces {teal}, recently released on CRAN with...
Go to contribution page -
Natalia da Silva (Universidad de la Repรบblica)08/07/2026, 10:50Talks (15-20 minutes)
This talk presents enhancements to the projection pursuit tree classifier and visual diagnostic methods for assessing their impact in high dimensions. The original algorithm uses linear combinations of variables in a tree structure where depth is constrained to be less than the number of classes, a limitation that proves too rigid for complex classification problems. Our extensions improve...
Go to contribution page -
Tom Elliott (iNZight Analytics Ltd)08/07/2026, 10:50Talks (15-20 minutes)
Web front-ends offer seamless deployment across desktop and mobile platforms, with an ever growing variety of libraries enabling rich, interactive user experiences. However, integrating these modern web technologies with R's powerful analytic capabilities presents significant challenges.
Rserve is an R library that enables two-way communication between R and other programming...
Go to contribution page -
Gary Sutton08/07/2026, 10:50Talks (15-20 minutes)
Many data scientists and other R users have no doubt been assigned to projects that quickly derailedโmissed deadlines, scope creep, budget overrunsโoften because effective project management was assumed to be a "soft skill" anyone without deep technical expertise could handle. In reality, successful project delivery demands rigorous quantitative techniques: structured decomposition,...
Go to contribution page -
Renata Hirota08/07/2026, 10:50Talks (15-20 minutes)
Climate data has coordinates, but also powerful stories hiding in plain sight. In this talk, Iโll show the reasons why I feel in love with the R package {sf} while covering climate change in the Amazon as a freelance journalist. Every Last Drop is a project about oil exploration in the Amazon that showcases that spatial analysis doesn't...
Go to contribution page -
Edzer Pebesma (University of Mรผnster)08/07/2026, 11:10Talks (15-20 minutes)
Areal units (polygons or pixels) are often used to summarize and distribute spatial data. In many spatial data science studies, datasets with different areal units need to be combined, and to do so first have to be transformed into a common set of areal units. This involves upscaling (going to a coarser resolution) or downscaling (going to a finer spatial resolution), or a combination. This...
Go to contribution page -
Alyssa Columbus08/07/2026, 11:10Talks (15-20 minutes)
Empirical data analysis often involves a large number of defensible analytic choices, including model specification, covariate selection, transformations, and approaches to missing data. These decisions can meaningfully influence statistical results, yet they are rarely explored or reported systematically. This talk presents a reproducible R-based workflow for examining analytic multiplicity...
Go to contribution page -
Deepansh Khurana (Dimwit Labs)08/07/2026, 11:10Talks (15-20 minutes)
In 2024, I talked about How I Built An API for My Life (and How You Can Too) which was a personal life tracker, called Hrafnagud, built using a slew of services but primarily with the support of R, namely, {plumber} and {shiny}. This was an API with a chunk of different endpoints for finance, travel, and more. The key focus was building the first iteration of such a system, and as the system...
Go to contribution page -
Eileen Vattheuer (Statistics Austria)08/07/2026, 11:10Talks (15-20 minutes)
Handling missing values in heterogeneous datasets remains a common challenge in applied data analysis. We introduce vimpute(), a new function in the R package VIM that provides a unified and model-adaptive framework for multivariate imputation. The function implements an iterative, variable-wise imputation procedure that adapts the modelling strategy to the type and characteristics of each...
Go to contribution page -
Dr Filiz Karadag (Ege University), Dr Olgun Aydin (Gdansk University of Technology)08/07/2026, 11:30Talks (15-20 minutes)
The subject of this presentation concerns the implementation of regularized regression and advanced parameter estimation procedures. The study demonstrates the practical application of these methods using three R packages developed by the authors: S-type.est, Styperidge.reg, and ridgregextra.
Within this ecosystem, the S-type.est package provides the core estimation procedures for S-type...
Go to contribution page -
Charlie Gao (Posit Software, PBC)08/07/2026, 11:30Talks (15-20 minutes)
Collaborative data analysis presents a fundamental concurrency problem: when multiple users modify the same data simultaneously, how should conflicts be resolved? Traditional approaches rely on locking or central arbitration, but conflict-free replicated data types (CRDTs) offer a principled alternative. A CRDT is a data structure whose concurrent operations are guaranteed to converge to the...
Go to contribution page -
Roger Bivand (Norwegian School of Economics)08/07/2026, 11:30Talks (15-20 minutes)
When we represent or analyse spatial data, the position of observations matters. Where objects of interest are, also in relation to each other, and how position is measured, are referenced through coordinate reference systems (CRS), including units of measurement. Local CRS use arbitrary starting points, while standard CRS rely on tabulated values, for example the axis lengths of an ellipsoid...
Go to contribution page -
Deepayan Sarkar (Indian Statistical Institute, Delhi Centre)08/07/2026, 11:30Talks (15-20 minutes)
The National Health and Nutrition Examination Survey (NHANES) provides extensive public data on demographics, health, and nutrition, collected in two-year cycles since 1999. Although invaluable for epidemiological and health-related research, the complexity of NHANES data makes accessing, managing, and analyzing these datasets challenging. We present a reproducible computational environment...
Go to contribution page -
Florian Sihler (Ulm University)08/07/2026, 13:00Talks (15-20 minutes)
Listing over 23,000 Packages on CRAN alone, the R ecosystem provides a plethora of general-purpose and domain-specific packages. While there is a strict quality standard for packages to be accepted on CRAN, there is little information available about the evolution and current state of the ecosystem.
Go to contribution page
With the crawlR project, we statically analyzed all versions of all packages available on CRAN... -
Benjamin Schwendinger (Fraunhofer Austria Research GmbH)08/07/2026, 13:00Talks (15-20 minutes)
Rโs copy-on-modify semantics make repeated growth and row-wise operations appear inherently expensive. Appending rows or incrementally building tabular structures typically triggers reallocation and copying. Recent versions of R now introduce an experimental resizable vector API, allowing vectors to be allocated with a maximum capacity and resized up to that capacity without...
Go to contribution page -
Gergely Daroczi08/07/2026, 13:20Talks (15-20 minutes)
Selecting the right cloud instance type for model training or other compute-intensive R workloads is often a guesswork:
- Unclear resource requirements, e.g. "How much RAM do I need to train this hierarchical model?" or "Can my script scale to multiple CPU cores or even GPUs?"
- Pricing and hardware specs exist, but are fragmented across vendors and hard to compare, especially when real...
-
Mark Padgham (rOpenSci)08/07/2026, 13:20Talks (15-20 minutes)
CRAN represents and curates a complex software ecosystem of over 25,000 packages. This ecosystem constantly evolves as packages are submitted, updated, and archived. We analysed the development of CRAN over its entire lifetime, both in terms of package inter-dependencies and the internal structures of every version of every package. These analyses used the...
Go to contribution page -
Antoine Fabri (cynkra)08/07/2026, 13:40Talks (15-20 minutes)
Debugging in R often relies on manually inspecting intermediate results, usually by adding print statements to track the evolution of variables. The {boomer} package simplifies this process by automatically displaying these results in a readable and flexible way, without even modifying the code of the analyzed functions.
It gets its name from the fact that it โexplodesโ a call into its...
Go to contribution page -
Laia Domenech-Burin (Sovereign Tech Agency)08/07/2026, 13:40Talks (15-20 minutes)
The R language provides essential infrastructure for statistical computing, research, and data science worldwide. Yet, the labour that sustains this infrastructure remains largely invisible: maintaining legacy upstream code, triaging complex systems-level bugs, and hardening build pipelines. This work underpins the reliability and security of the ecosystem, but remains difficult to measure,...
Go to contribution page -
Henrik Bengtsson (University of California San Francisco (UCSF))08/07/2026, 14:00Talks (15-20 minutes)
Ever felt like parallelizing your R code requires a complete rewrite? Transitioning from sequential code to parallel execution has traditionally meant dealing with fragmented, obscure, package-specific APIs that distract us from our main goals. Which packages and functions should I use, and what platforms should I support? There are a lot of upfront decisions to make, with many...
Go to contribution page -
Maciej Nasiลski (UCB & University of Warsaw)09/07/2026, 10:30Talks (15-20 minutes)
By 2026, the era of vibe coding has made rapid prototyping effortless; however, it has also highlighted a significant gap between demos and production-quality systems. While AI agents can simulate agile processes, they often optimise for speed at the expense of architectural integrity, security, and long-term technical debt. Anyone can generate code, but not everyone can manage a project. The...
Go to contribution page -
Mikael Jagan09/07/2026, 10:30Talks (15-20 minutes)
We present the history, structure, and design philosophy of "recommended" R package Matrix, which extends R with classes and methods for structured matrices having sparse or dense storage. We review recent and ongoing development in Matrix, covering matrices with integer or complex data, matrix factorizations, and improved documentation. Finally, as the number of reverse dependencies of...
Go to contribution page -
Christian Martinez (CUNY)09/07/2026, 10:30Talks (15-20 minutes)
Reproducibility in R is often taught as a final requirement rather than as a workflow that evolves over time. In many research methods courses, students write one-off scripts against artificial datasets, submit them, and never return to their code. This talk presents an alternative: an end-to-end, R-native ecosystem designed to move students from code users to reproducible researchersโand...
Go to contribution page -
Mx Katrina Brock (Max Planck Institute of Animal Behavior)09/07/2026, 10:30Talks (15-20 minutes)
The behavior ecology literature offers a rich library of approaches for finding patterns in animal movement data. While many of the biologists developing these algorithms publish their code, it is rarely optimized for reuse. Even well-designed packages with similar workflows have different interfaces that potential users need to learn one by one. By wrapping these algorithms in a standardized...
Go to contribution page -
Ella Kaye (University of Birmingham)09/07/2026, 10:50Talks (15-20 minutes)
The throw sequence "423" is a valid pattern for juggling with three balls, but "432" will result in collisions and dropped balls. How can you tell? All juggling patterns can be described in a notation called siteswap, and siteswap sequences can be mathematically validated and visualised.
In this talk, I introduce jugglr, an R package for working with siteswap sequences. It validates...
Go to contribution page -
David Granjon (cynkra GmbH)09/07/2026, 10:50Talks (15-20 minutes)
We are delighted to introduce 'blockr', a visual, block-based interface for building, customising, and sharing interactive R data workflows, without any coding experience.
'blockr' enables users to snap together modular blocks for data loading, transformation, visualisation, and export, forming directed acyclic graph (DAG) pipelines with instant visual feedback. It carefully integrates AI...
Go to contribution page -
Aymeric Stamm (Department of Mathematics Jean Leray, UMR CNRS 6629, Nantes University)09/07/2026, 10:50Talks (15-20 minutes)
Topological data analysis (TDA) is an emerging area of statistical research grounded in topology, intersecting with exploratory analysis, statistical inference, and machine learning. It is therefore important for R users to have access to comprehensive and reliable TDA tools.
Published R packages for TDA fall into three categories: First, {TDA} and {rgudhi} interface with comprehensive...
Go to contribution page -
Janith Wanniarachchi (Monash University)09/07/2026, 10:50Talks (15-20 minutes)
Understanding the behaviour of complex machine learning models has become a challenge in the modern day. Explainable AI (XAI) methods were introduced to provide insights into model predictions, however explaining these explanations can be difficult without proper visualisation methods. In addition, settling the disagreements between these explainers can be difficult based purely on numerical...
Go to contribution page -
Balasubramanian Narasimhan (Stanford University)09/07/2026, 11:10Talks (15-20 minutes)
CVXR is the R implementation of CVXPY, a widely-used disciplined convex optimization framework. Maintained by two developers, the S4-based CVXR 1.0 had fallen significantly behind CVXPY in features. We report on a complete rewrite using S7 We report on a complete rewrite using S7, that is now on CRAN that targets current version of CVXPY. The new version is 4-5x faster than old CVXR and the...
Go to contribution page -
Gero Szepannek (Stralsund university of Applied Sciences)09/07/2026, 11:10Talks (15-20 minutes)
The package clustMixType [3] is one of the most popular packages for clustering of mixed-type data. Nonethless, an open issue not only for clustering mixed-type data but also for clustering in general is an appropriate weighting of the variables. In Huangโs original paper [1] as well as in the clustMixType package only heuristics are given for this purpose. In the presentation it will be...
Go to contribution page -
Jakub Grzywaczewski (Warsaw University of Technology), Dr Nuno Sepรบlveda (Warsaw University of Technology)09/07/2026, 11:10Talks (15-20 minutes)
Pre-processing and quality control of high-dimensional serological data from Multiplex Bead Assay machines pose a significant bottleneck to the responsible application of machine learning to global health challenges. Driven by the data demands of the PvSTATEM project, an international initiative aimed at malaria elimination, we developed SerolyzeR, an open-source R package designed to...
Go to contribution page -
Efstathios Gennatas (UCSF)09/07/2026, 11:30Talks (15-20 minutes)
Basic research and clinical medicine are increasingly capitalizing on data-driven approaches to derive insights into disease pathophysiology and discover new therapeutic targets. While advanced algorithms are readily available, their application requires a combination of domain, quantitative, and technical expertise, leaving them out of reach for many domain expert researchers and clinicians....
Go to contribution page -
210. Playful Teaching of Simulation Models: From Monolithic Shiny Apps to Quarto Dashboards and webRThomas Petzoldt (TUD - Dresden University of Technology)09/07/2026, 11:30Talks (15-20 minutes)
Quantitative modeling is essential in the life and environmental sciences, yet students often face significant barriers due to "math anxiety" and programming complexity. While differential equations are often perceived as dry or difficult, interactive simulations offer a playful entry point that fosters intuitive understanding. However, traditional "downloadable models" often suffer from...
Go to contribution page -
Mitchell O'Hara-Wild (Monash University)09/07/2026, 11:30Talks (15-20 minutes)
Statistical analysis on temporal, spatial, graph, and probabilistic data is error-prone when the data types lack intrinsic structure. Outputs from models typically return these composite data types separately, requiring the user to assemble and apply the results correctly. This reduces the accessibility of statistics and results in error-prone analysis. Representing these data types using...
Go to contribution page -
Aymeric Stamm (Department of Mathematics Jean Leray, UMR CNRS 6629, Nantes University)09/07/2026, 11:30Talks (15-20 minutes)
Diffusion magnetic resonance imaging (MRI) is a non-invasive imaging technique that allows us to probe the microstructure in the brain at a mesoscopic scale by making the MR signal sensitive to the diffusion of water molecules in the brain, which is restricted or hindered by cellular structures such as axons or glial cells. Diffusion MRI suffers from a poor spatial resolution, which yields the...
Go to contribution page
Choose timezone
Your profile timezone: