6–9 Jul 2026
Europe/Warsaw timezone

Contribution List

183 out of 183 displayed
  1. Ms Anqi Fu (Memorial Sloan Kettering), Balasubramanian Narasimhan (Stanford University)
    06/07/2026, 09:00
    All tracks
    Tutorial (3 hours)

    Convex optimization is fundamental to modern statistics and machine learning, underpinning methods from least squares and ridge regression to support vector machines (SVMs) and portfolio optimization. While Python users have long enjoyed state-of-the-art convex optimization through CVXPY, R users now have access to the same capabilities through CVXR 1.8.x---a complete rewrite using R's S7...

    Go to contribution page
  2. Dr Tomasz Żółtak (Institute of Philosophy and Sociology, Polish Academy of Sciences)
    06/07/2026, 09:00
    Social sciences
    Tutorial (3 hours)

    This workshop introduces participants to the workflow of processing, analyzing and visualizing log-data describing respondent interactions with web survey interface collected from the open and popular LimeSurvey survey platform, using the logLime R package (along with other packages: dplyr, ggplot2, ggdensity and gganimate). While discussing this process, participants will discover different...

    Go to contribution page
  3. Prof. Jakub Nowosad (Adam Mickiewicz University), Dr Jannes Muenchow (cynkra GmbH)
    06/07/2026, 09:00
    All tracks
    Tutorial (3 hours)

    R has become one of the most widely used languages for geographic data science. Its strength lies in a well-established ecosystem of several hundred spatial packages that support geographic data handling, analysis, and visualization, while integrating seamlessly with R’s wider tools for data processing and statistical analysis. R's flexibility and statistical capabilities make it attractive...

    Go to contribution page
  4. Krzysztof Dyba (Adam Mickiewicz University in Poznan)
    06/07/2026, 09:00
    All tracks
    Tutorial (3 hours)

    Brief biography

    Krzysztof Dyba is a senior lecturer at Adam Mickiewicz University in Poznan specializing in spatial data science and remote sensing. His teaching experience includes conducting four international workshops on R applications for satellite data at the OpenGeoHub Summer School, as well as leading an external course on “Advanced Spatial Analysis” at Maria Curie-Sklodowska...

    Go to contribution page
  5. Prof. Katarzyna Kopczewska (University of Warsaw)
    06/07/2026, 09:00
    Econometrics and financial modeling
    Tutorial (3 hours)

    This workshop introduces participants to the analytics of spatial geo-located point data. It starts with data processing (reading to sf class, visualisation in ggplot, plot and interactive mapview, CRS re-projecting); then it continues with detection of density clusters (QDC, DBSCAN), degree of agglomeration (using entropy-based ETA and SPAG) and comparison of density patterns by further...

    Go to contribution page
  6. Christoph Sax (cynkra GmbH, University of Basel), David Granjon (cynkra GmbH)
    06/07/2026, 13:00
    Web applications (Shiny, dashboards)
    Tutorial (3 hours)

    Data analysis in R requires writing code, which remains a barrier for many domain experts. blockr is an open-source visual programming framework for R that allows users to construct reactive data pipelines by assembling modular blocks through a point-and-click interface. The framework generates reproducible R code automatically.

    This hands-on tutorial takes participants from first use to...

    Go to contribution page
  7. Oleg Lugovoy (Optimal Solution LLC)
    06/07/2026, 13:00
    Econometrics and financial modeling
    Tutorial (3 hours)

    Energy Systems optimization models, also know as Macro Energy System (MES) models are the key tools to evaluate energy transition and decarbonization strategies for countries, regions, and globally. The workshop introduces a set of tools and open datasets to design and compare energy transition scenarios, compile reports, all not leaving R. Hands on sessions will focus on preparing datasets...

    Go to contribution page
  8. Nick Barrowman (CHEO Research Institute)
    06/07/2026, 13:00
    Data visualization
    Tutorial (3 hours)

    Suppose you want to know how many companies there are with over 100 employees in each region of several countries. We could write this as country >> region >> company >> employees. With just two variables, you can make a two-way table of counts with row or column percentages. But this does not easily generalize to larger numbers of variables, and attempts to display this kind of information...

    Go to contribution page
  9. Henrik Bengtsson (University of California San Francisco (UCSF))
    06/07/2026, 13:00
    Efficient programming
    Tutorial (3 hours)

    This tutorial provides an introduction to the Futureverse (https://www.futureverse.org), a cohesive package ecosystem designed to facilitate and simplify parallel and distributed computing in R.

    While accessible to beginners and those with some R experience, this workshop also provides valuable insights for more advanced users. We will focus on the new futurize() function, which...

    Go to contribution page
  10. Dr Pawel Orzechowski (The University of Edinburgh), Dr Brittany Blankinship (The University of Edinburgh)
    06/07/2026, 13:00
    Education
    Tutorial (3 hours)

    Would your code be better if you had someone by your side to talk through it? Or would your code be clearer if you might have to hand over the keyboard to someone at any moment?

    Pair programming is a collaboration technique widely used in the software industry – it involves two people working together on one programming task. One person is the driver, suggesting solutions and typing the...

    Go to contribution page
  11. MIchał Ramsza (SGH Warsaw School of Economics)
    06/07/2026, 13:00
    Analysis best practices and workflows
    Tutorial (3 hours)

    Abstract. This workshop introduces participants to using the Quarto system for creating reproducible documents. Participants will learn how to create Quarto documents, including a complete workflow from data loading and wrangling to analysis and automated document generation. The final document format can be many; however, the workshop focuses on DOCX with custom styling.

    *Outline of the...

    Go to contribution page
  12. Dianne Cook
    07/07/2026, 09:00

    Abstract:
    When crossing a busy street, we understand that keeping our eyes open isn't optional—it's how we stay safe. Yet when building complex models, we often choose to work blind. Some of this is understandable — visualising high-dimensional data is genuinely difficult. But cultural attitudes matter too: there's a lingering belief that "looking at the data" compromises objectivity, and a...

    Go to contribution page
  13. Max Marchi (Cleveland Guardians)
    07/07/2026, 10:30
    Case studies and applications
    Talks (15-20 minutes)

    LeBron choosing between taking a mid-range shot and feeding a teammate for an open three-pointer; Verstappen heading towards the pit lane or driving another few laps with the current tires; Sinner going all-in on a second serve against Alcaraz. Making split-second decisions when the stakes are the highest requires the talent and hard work that only great athletes have. Analysis of sports data...

    Go to contribution page
  14. Antonio Grosso
    07/07/2026, 10:30
    Talks (15-20 minutes)

    In today's digital landscape, the demand for making statistics more accessible and meaningful to the general public is increasing. This has led to the need for fast and efficient ways to disseminate statistical information, particularly when data visualisations must be produced quickly and updated frequently. In this context, Eurostat launched a project under the ESS Innovation Agenda and...

    Go to contribution page
  15. Edoardo Mancini (Roche)
    07/07/2026, 10:30
    Talks (15-20 minutes)

    You’ve built a great R package. People are using it. Feature completeness is in sight. Congratulations - you’ve defied the odds. Now the hard part begins!

    Transitioning an open-source R package from active development to long-term maintenance and stability is a complex shift. Throughout this talk we’ll explore methods to tackle this often overlooked, but key challenge of the open-source...

    Go to contribution page
  16. Kristian Vepsäläinen (freelance data scientist)
    07/07/2026, 10:30
    Talks (15-20 minutes)

    At the 1980 Winter Olympics, the men’s 15 km cross-country skiing race was decided by one hundredth of a second. Rather than debating historical causes, we treat this as a modeling problem: could a physically plausible effect—such as a small change in aerodynamic drag—have been large enough to matter?

    In this talk, we demonstrate how Bayesian forward simulation in R can be used to reason...

    Go to contribution page
  17. Lluís Revilla Sancho
    07/07/2026, 10:50
    Talks (15-20 minutes)

    Publishing a package on CRAN is often half the work of a maintainer, then comes the hardest part: maintaining it there. Many resources focus on getting the package published on CRAN and what it takes one maintainer to do so. They share common problems and how to solve them but they are not based on data or focused on maintaining the package on CRAN.

    Recently, CRAN has open up some...

    Go to contribution page
  18. Maximilian Mücke (LMU Munich)
    07/07/2026, 10:50
    Talks (15-20 minutes)

    mlr3forecast extends the mlr3 ecosystem to support time series forecasting workflows. It introduces a dedicated forecasting task class and resampling strategies that respect temporal ordering, enabling forecasting models to be benchmarked, tuned, and combined in a systematic way. Learners wrapping established forecasting methods such as ARIMA and ETS can be used alongside any mlr3 learner...

    Go to contribution page
  19. Dr Nicolas Flores Castillo (BHP, Organisational Development and Analytics)
    07/07/2026, 10:50
    Talks (15-20 minutes)

    Traditional workforce analytics relies on aggregate metrics that obscure individual-level dynamics and wash out critical correlations between employee characteristics, career trajectories, and compensation outcomes. This talk presents a Workforce Digital Twin methodology using individual-level stochastic simulation to model strategic gender pay equity interventions in a large global...

    Go to contribution page
  20. Wolfgang Viechtbauer (Maastricht University)
    07/07/2026, 10:50
    Talks (15-20 minutes)

    Interactive graphical applications in R are typically built using external web-based frameworks such as Shiny or by interfacing with other programming languages such as Tcl/Tk or JavaScript. However, these approaches may require server infrastructure or knowledge of additional programming languages.

    As an alternative, the base R function grDevices::getGraphicsEvent() enables interactive...

    Go to contribution page
  21. Tomasz Woźniak (University of Melbourne)
    07/07/2026, 11:10
    Talks (15-20 minutes)

    The R package bsvars provides a wide range of tools for empirical macroeconomic and financial analyses using Bayesian Structural Vector Autoregressions. It uses frontier econometric techniques and C++ code to ensure fast and efficient estimation of these multivariate dynamic structural models, possibly with many variables, complex identification strategies, and non-linear...

    Go to contribution page
  22. Cynthia Huang (LMU Munich)
    07/07/2026, 11:10
    Talks (15-20 minutes)

    Wrapping 'ggplot2' code into plot helper functions is a common way to make multiple versions of a custom plot without
    copying and pasting the same code over and over again. Helper functions can replace long and complex 'ggplot2' code
    chunks with just a single function call. However, if that single function is not designed carefully, the initial convenience can
    often turn into frustration....

    Go to contribution page
  23. Dr Ahmadou Dicko
    07/07/2026, 11:10
    Talks (15-20 minutes)

    In methodology reports and academic textbooks, sampling designs are described in a structured way, with an explicit logic connecting strata, clusters, sampling stages, and selection probabilities. This structure is often lost in the R code that implements it, where the design gets scattered across intermediate objects, successive calls, and technical details. The more complex the plan, the...

    Go to contribution page
  24. Cristina Faricelli (ISTAT)
    07/07/2026, 11:10
    Statistical models and methods
    Talks (15-20 minutes)

    UnitMix is an R package designed to detect and correct unit of measurement errors using Gaussian mixture model-based clustering, supporting both methodological research and production workflows at National Statistical Institutes (NSIs).
    The core function, assign.cluster, implements a multivariate Gaussian mixture model on log-transformed variables, allowing clusters to be defined...

    Go to contribution page
  25. Oscar Thees (FHWN / SwissAnon)
    07/07/2026, 11:30
    Talks (15-20 minutes)

    The growing movement toward open data, open science, and open government has increased the demand for sharing detailed microdata while protecting individual privacy. Data anonymization methods, including classical statistical disclosure control techniques and synthetic data generation, enable data sharing, but evaluating the resulting privacy risks and analytical usefulness remains...

    Go to contribution page
  26. Simon Urbanek
    07/07/2026, 11:30
    Talks (15-20 minutes)

    The R ecosystem provides very good infrastructure for making R accessible as web-based documents, dashboards and for performing remote analyses. This opens up the possibilities for using interactive graphics which are important both for exploratory data analysis and presentation. However, most such solutions consists of simply binding exisiting JavaScript libraries which are often designed for...

    Go to contribution page
  27. Szymon Maksymiuk (WUT)
    07/07/2026, 11:30
    Talks (15-20 minutes)

    With over 23,000 packages, CRAN forms a large ecosystem of reviewed code. Additionally, its strict rules and dependency structure make it great for analyzing bug propagation in an ecosystem.

    This talk presents a hypothetical scenario in which I introduce a bug into base R. I analyze the consequences for the entire ecosystem and whether packages themselves would catch those bugs with their...

    Go to contribution page
  28. Maximilian Mücke (LMU Munich)
    07/07/2026, 11:50
    Lightning Talk (5 minutes)

    bbk provides a unified R interface for accessing data from major central banks, including the Deutsche Bundesbank, European Central Bank, Swiss National Bank, and Bank of England. Central bank data is widely used in economic research and financial modeling, yet each institution exposes its own API with different conventions, making reproducible data access cumbersome. bbk abstracts over these...

    Go to contribution page
  29. Anastasiia Kostiv
    07/07/2026, 11:50
    Lightning Talk (5 minutes)

    How do you bring modern, interactive data visualizations into R without writing JavaScript? The nivo.bubblechart package lets R users create responsive, animated circle packing charts using a familiar data frame workflow. Built on top of the Nivo visualization library (React + D3), it provides clean defaults, customizable colors and interactivity, and seamless Shiny integration with click and...

    Go to contribution page
  30. Mr Igor Kołodziej (Warsaw University of Technology), Mr Mateusz Iwaniuk (Warsaw University of Technology)
    07/07/2026, 11:50
    Lightning Talk (5 minutes)

    Nonignorable nonresponse (NMAR) presents a persistent challenge in official survey statistics, where missingness mechanisms depending on unobserved variables can significantly bias key national indicators. The NMAR package addresses this issue by providing a comprehensive suite of modern estimation methods within a unified API, specifically designed for the complex reality of NMAR scenarios in...

    Go to contribution page
  31. Moritz Lang (Roche)
    07/07/2026, 11:50
    Lightning Talk (5 minutes)

    R has excellent tooling for documenting functions via roxygen2, but no equivalent for test cases. Who wrote a test? Who reviewed it? What requirement does it verify? This information often lives in comments or external documents - disconnected from the code.

    The roxyreqs package extends roxygen2 to support @meta tags above test_that() blocks:

    #' @meta author Alice
    #'...
    
    Go to contribution page
  32. Vincent Arel-Bundock (Université de Montréal)
    07/07/2026, 11:55
    Lightning Talk (5 minutes)

    Policy debates, product decisions, and scientific claims all hinge on a simple question: what would happen to Y if we changed X? In this talk, I will present a practical causal inference workflow in R, using the marginaleffects package. This package offers a consistent interface for causal inference, and it is compatible with virtually all model-fitting packages in the R ecosystem. The key...

    Go to contribution page
  33. Laura Bąkała
    07/07/2026, 11:55
    Lightning Talk (5 minutes)

    It's no secret that to optimize R code one has to write it in C (or C++). This is, however, just a beginning. When does it make sense to rewrite the code? What are the typical performance sinkholes? What are the key techniques to get even more performance?

    There are surprisingly few resources where you could find the answers, and my talk will try to fill that gap. Lately, I've been working...

    Go to contribution page
  34. Dr Máté Szilcz (Viti Science)
    07/07/2026, 11:55
    Lightning Talk (5 minutes)

    Background: Public health databases often present barriers to non-technical users through complex query interfaces requiring knowledge of variable codes and API structures. The Swedish National Board of Health and Welfare maintains a comprehensive statistical database on disease prevalence and hospital care, yet exploring these data programmatically demands substantial technical...

    Go to contribution page
  35. ayush pundir (IIIT UNA)
    07/07/2026, 11:55
    Lightning Talk (5 minutes)

    What determines success in the English Premier League — superior
    player quality, tactical approach, or transfer investment? This
    study addresses that question empirically using R and the
    worldfootballR package.

    We construct a panel dataset of 100 team-season observations
    across five EPL seasons (2019/20 to 2023/24) by combining FBref
    and Transfermarkt data via worldfootballR....

    Go to contribution page
  36. Marco Colombo (University of Heidelberg)
    07/07/2026, 13:00
    Talks (15-20 minutes)

    Fuzz testing is a technique that generates random, unexpected or malformed
    data and feeds them into a program. By observing how the software behaves
    under such conditions, this approach may uncover potential weaknesses,
    vulnerabilities and security flaws in applications.

    A dynamically-typed language such as R poses some difficulties to fuzz testing,
    as by definition no predefined typing...

    Go to contribution page
  37. Dr Pierre Donat-Bouillud (Czech Technical University in Prague)
    07/07/2026, 13:20
    Talks (15-20 minutes)

    The standard metric for testing and is widely used in R packages, for instance with testthat or tinytest. However, coverage can often be misleading: 100% coverage does not mean the absence of bugs, and line covered with a test does not mean that all behaviours flowing through this line are effectively verified.

    This presentation introduces MutatoR, a new package designed to bring...

    Go to contribution page
  38. Szymon Maksymiuk (Roche), Lorenzo Braschi (Roche)
    07/07/2026, 13:40
    Talks (15-20 minutes)

    Software validation is a key part of analyses conducted in a regulatory environment, and it is crucial to document the accuracy, reproducibility, and traceability of the software used for clinical analyses. At Roche, we initially outsourced this effort to an external partner. However, this was a solution far from ideal due to, among others, bottlenecks, a fixed release schedule, and...

    Go to contribution page
  39. Magnus Mengelbier (Independent / Freelancer)
    07/07/2026, 14:00
    Talks (15-20 minutes)

    There are many validation approaches for R and its packages that has a foundation in classic software and application validation, specifically how it relates to statistical analysis applications within regulated industries like Life Science. The introduction of risk-based validation approaches in the last decade has provided additional tools, but as we now approach R as a language, packages as...

    Go to contribution page
  40. Pawel Rucki (Roche)
    07/07/2026, 14:20
    Lightning Talk (5 minutes)

    While standard LLMs are powerful for general coding, they often fall short when working with specialized or internal R packages. This usually comes down to a knowledge gap - public models simply do not have access to private packages or the most recent documentation. To solve this, we propose a shift away from a single, all-purpose assistant toward a decentralized network of agents, where each...

    Go to contribution page
  41. Pawel Rucki (Roche)
    07/07/2026, 14:25
    Lightning Talk (5 minutes)

    When an AI assistant suggests code for a specialized R package, it’s often guessing based on outdated or public data. We built mcp.rhelp to turn local R documentation into a tool for AI coding assistants. This lightweight MCP server allows assistants to navigate R’s documentation - finding where a function lives, reading its exact help file, and inspecting its source code. In this talk, I...

    Go to contribution page
  42. Dariia Mykhailyshyna
    07/07/2026, 15:00

    Abstract:
    The talk examines the use of R for empirical research, university teaching, and community-oriented training related to Ukraine during the Russian full-scale invasion. It focuses on how R is applied in applied research projects, in classroom instruction, and in initiatives such as the Workshops for Ukraine series when work is affected by repeated disruption.

    Examples will be used...

    Go to contribution page
  43. Tina Rozsos (Vrije Universiteit Amsterdam)
    07/07/2026, 16:00
    Lightning Talk (5 minutes)

    A PhD is a complex, multi-year project that involves managing a diverse range of information: from meeting notes to research ideas, from immediate tasks to long-term plans. Disorganized notes and files lead to inefficiencies, loss of information, and challenges in reproducing past workflows. This lightning talk presents a comprehensive PhD documentation template built with R and Quarto, meant...

    Go to contribution page
  44. Dr Sinead Moylett (University of Limerick)
    07/07/2026, 16:00
    Epidemiology
    Lightning Talk (5 minutes)

    Missing data is a structural feature of longitudinal cohort studies and can bias inference when attrition is systematic. We present an R-based evaluation of imputation strategies for ordinal outcomes in the Irish Longitudinal Study on Ageing (TILDA) across five waves. All data processing, simulation, imputation, and evaluation were implemented in R, enabling a fully reproducible workflow for...

    Go to contribution page
  45. Mr Clievins Selva (Deutsch)
    07/07/2026, 16:00
    Lightning Talk (5 minutes)

    Researchers conducting surveys and assessments often encounter commercial barriers. While convenient, platforms also come with substantial licensing costs, vendor lock-in, and data sovereignty concerns. Meanwhile, open-source alternatives typically require researchers to use separate tools for survey design, data collection, analysis and reporting. This creates inefficiencies, increases the...

    Go to contribution page
  46. Błażej Kochański (Politechnika Gdańska)
    07/07/2026, 16:05
    Lightning Talk (5 minutes)

    The Area Under the Receiver Operating Characteristic Curve (AUC) is a widely used measure for evaluating the performance of binary classification models. In the literature and in practice, it appears under various names and is closely related to other performance measures. We review these formulations and discuss the motivation for efficient AUC computation in empirical analysis. We survey R...

    Go to contribution page
  47. Tomasz Żółtak (Educational Research Institute, Warsaw, Poland), Dr Grzegorz Humenny (Educational Research Institute, Warsaw, Poland), Mr Bartłomiej Płatkowski (Educational Research Institute, Warsaw, Poland)
    07/07/2026, 16:05
    Lightning Talk (5 minutes)

    Polish secondary school graduates tracking system has been providing data on further educational and professional careers of Polish youth annually since 2021. Moreover, the ways of disseminating these results are constantly being developed: from static, general reports to interactive dashboards that are designed to serve the needs of specific groups of stakeholders. In our presentation we will...

    Go to contribution page
  48. Håvard R. Karlsen (NTNU)
    07/07/2026, 16:05
    Lightning Talk (5 minutes)

    Logging your own work hours is an efficient way to work out what you spend your time on. It can be used to help you manage your time better, to ensure you spend an appropriate time on a project, or even to negotiate pay or responsibilities at work. In this lighting talk I show how I keep track of my own hours using R and an external tracking tool (here: Clockify). My...

    Go to contribution page
  49. Anastasiia Kostiv
    07/07/2026, 16:10
    Lightning Talk (5 minutes)

    R has long been recognized for its statistical computing power, and Shiny has revolutionized how R users build interactive applications. However, deploying Shiny apps as standalone desktop software remains cumbersome - requiring a browser, a running R session, and often a server infrastructure. What if R could power true native desktop applications? ...

    Go to contribution page
  50. Jagoda Głowacka-Walas
    07/07/2026, 16:10
    Lightning Talk (5 minutes)

    Missing data is a common challenge in data analysis, particularly in multi-omics studies, where heterogeneous data sources and technical limitations often result in incomplete measurements. In these situations, predictive models may fail when some required variables are missing.

    This presentation demonstrates ensemble learning strategies designed to improve prediction despite incomplete...

    Go to contribution page
  51. Mr Winkle Lu
    07/07/2026, 16:10
    Lightning Talk (5 minutes)

    During clinical trials, EDC data must be reviewed regularly — from routine medical review reports to formal audit and inspection scenarios. In these contexts, reviewers do not explore data freely. They follow a predictable, structured process and need a document that clearly records what was seen, under what conditions, and what conclusions were drawn. This core distinction shapes the design...

    Go to contribution page
  52. Janez Bijec (University of Ljubljana / Statistical office of Slovenia)
    07/07/2026, 16:15
    Lightning Talk (5 minutes)

    Routine administrative data collected by healthcare payers hold significant potential for monitoring care quality, yet translating them into actionable insights requires careful statistical modeling and thoughtful communication. This talk presents a complete R-based pipeline — from raw reimbursement data to an interactive Shiny dashboard — developed to benchmark hospital performance for...

    Go to contribution page
  53. Juan Claramunt Gonzalez (Leiden University)
    07/07/2026, 16:15
    Lightning Talk (5 minutes)

    metacart 3.0 integrates regression and classification trees into the framework of meta-analysis to perform exploratory moderator analysis. Meta-regression trees identify, based on the study characteristics and their interactions, subgroups of studies that maximize within-subgroup homogeneity of effect sizes. To avoid overfitting, the resulting tree is pruned using cross-validation. Finally,...

    Go to contribution page
  54. Dr Matthew Valko (Boehringer Ingelheim Pharma GmbH & Co. KG), Dr Saumil Shah (Boehringer Ingelheim Pharma GmbH & Co. KG)
    07/07/2026, 16:15
    Lightning Talk (5 minutes)

    Simulations are a great tool to support decisions on optimal trial designs. Trial designs have evolved from simple 2-arm treatment comparison to multi-arm dose-finding, adaptive designs, and platform trials. R is a popular and important tool for those in the pharmaceutical industry. R6 programming and supporting R functions have allowed us to design a package rxsim that is flexible enough to...

    Go to contribution page
  55. Rozeta Simonovska
    07/07/2026, 16:20
    Lightning Talk (5 minutes)

    Part of the responsibilities of a statistician working on clinical trials is reviewing tables, listings, and figures (TLFs) to ensure accuracy and compliance. However, the review process can be challenging due to the volume of outputs and the dynamic nature of clinical trial data. A common issue arises when outputs are reviewed and checked, but subsequent data updates or programming changes...

    Go to contribution page
  56. Liam Mueller (UC San Diego)
    07/07/2026, 16:20
    Lightning Talk (5 minutes)

    How should we best model multinomial tradeoffs? When there are only two options that sum to 100 percent, it can be straightforward to employ options like GLM to address research questions with binomial response data, but when there are more than two groups, these multinomial models become too complex to perform and too convoluted for our audience to understand. However, using simplexes to...

    Go to contribution page
  57. CRISTIANE MILLAN (NIC.br – Brazilian Network Information Center)
    07/07/2026, 16:20
    Lightning Talk (5 minutes)

    Reliable Internet connectivity is a foundational requirement for digital health services such as telemedicine, electronic health records, and remote diagnostics. Without adequate connectivity, efforts to modernize healthcare systems risk reinforcing existing regional and socioeconomic inequalities.

    Brazil has approximately 50,000 Primary Health Care Units distributed across a vast and...
    
    Go to contribution page
  58. Ms Susanne Steinmann (Clinical Cancer Registry Lower-Saxony, Team Data Analysis)
    07/07/2026, 16:25
    Lightning Talk (5 minutes)

    The R language and R Studio are widely used in most of the federal cancer registries in Germany; but only few have special “R onboarding” programs. Therefore, a R user group “Forum R” was initiated in 2021 as part of the expert panel of the clinical cancer registries “Plattform § 65c”. This “Forum R” aims for networking and further education for employees of the cancer registries in Germany in...

    Go to contribution page
  59. Martin Zuba (Austrian National Public Health Institute (GOEG)), Zuzanna Brzozowska (Austrian National Public Health Institute (GOEG))
    07/07/2026, 16:25
    Lightning Talk (5 minutes)

    The Austrian Health Atlas is an open-access platform using interactive charts and maps to intuitively illustrate public health data. Developed by the Austrian National Public Health Institute (GÖG), the Gesundheitsatlas shows trends, determinants and socio-economic differences in the health of the Austrian population, offering both international and subnational comparisons.
    The platform's...

    Go to contribution page
  60. Wojciech Wójciak (Warsaw University of Technology)
    07/07/2026, 16:25
    Lightning Talk (5 minutes)

    Optimum allocation of sample sizes across strata is a fundamental problem in survey sampling. When designing a stratified survey, researchers must decide how to distribute a fixed total sample size among strata in order to achieve specific statistical goals, such as minimizing estimator variance or minimizing survey cost subject to precision constraints. Classical results, such as...

    Go to contribution page
  61. Adam Forys (Roche), Krystian Igras (7N)
    07/07/2026, 16:30
    Lightning Talk (5 minutes)

    To filter data, users need to know dataset structures, variable names, and valid value ranges. The cohortBuilder R package offers a common API for multi-step filtering across data frames, databases, and custom backends. The shinyCohortBuilder package adds an interactive Shiny GUI on top of it.

    We introduce a metadata layer in cohortBuilder that connects filtering pipelines to large...

    Go to contribution page
  62. Dr Saras Windecker (The Kids Research Institute, Perth, Australia)
    07/07/2026, 16:30
    Lightning Talk (5 minutes)

    Real-time growth and prevalence of infectious diseases are vital information for epidemic response. Yet it is rare to directly observe infections, and so we must reconstruct infection trends from delayed and imperfect alternatives, including time-series of case counts and cross-sectional infection prevalence surveys. Joint inference from multiple data can improve estimates of infection trends,...

    Go to contribution page
  63. January Weiner
    07/07/2026, 16:30
    Lightning Talk (5 minutes)

    The principle of scientific accountability is more than reproducibility of data analysis: each result – figure, table or p-value must be unequivocally and easily tracked to the original data from which it originates. In many bioinformatics workflows this kind of accountability is difficult to maintain, particularly in interactive web applications where exploratory analyses are performed...

    Go to contribution page
  64. Dr Brittany Blankinship (The University of Edinburgh), Dr Pawel Orzechowski (The University of Edinburgh)
    07/07/2026, 16:35
    Lightning Talk (5 minutes)

    Coding and data have been hijacked by macho-nonsense culture (and the teaching of coding even more so). We noticed a unique opportunity to write a book comprised of a range of case studies and ideas which flip the table and flip the narrative.

    Over the last 5 years we have built a community of coding and data educators who teach outside of traditional computer science settings. Our 300+...

    Go to contribution page
  65. Isaac Gravestock (Roche)
    07/07/2026, 16:35
    Lightning Talk (5 minutes)

    In clinical trials involving time to disease progression that can only be assessed at clinical visits, the real time to event is interval censored between visits, eg tumour assessments in progression-free survival (PFS) outcomes in oncology trials. In the typical analysis of PFS, we systematically impute the event time to be the visit where the progression was detected.

    In many cases the...

    Go to contribution page
  66. Raniere Gaia Costa da Silva
    07/07/2026, 16:35
    Lightning Talk (5 minutes)

    This task will show how [JupyterLite][1] can improve a community hosted training session by reducing the friction for learners to access JupyterLab in class and later at home.

    JupyterLite is a distribution of Jupyter for WebAssembly (Wasm) enabling JupyterLab to run completely (including the Jupyter server) in a modern web browser. Support to R on JupyterLite was added in 2025.

    For...

    Go to contribution page
  67. Tomasz Gieorgijewski (Rappsodia Labs)
    07/07/2026, 16:40
    Lightning Talk (5 minutes)

    Amethyst: new R's graphics device on macOS in the making.

    I wanted to deepen my R’s contributions in the domain of graphics, specifically on Mac computers.
    So I set myself on a project to create a new graphical device for R language.
    In the talk I will present my motivations for starting such project, what is the current state of graphical devices, especially the native one, shipped with...

    Go to contribution page
  68. Matthew Haffner (University of Wisconsin - Eau Claire)
    07/07/2026, 16:40
    Lightning Talk (5 minutes)

    Teaching university courses in geography, urban planning, and spatial data science demands tools that can convey place, and R is remarkably well-suited to the task. This talk presents a reproducible R-based teaching infrastructure built around several components: interactive spatially-enabled presentations, a course website ecosystem, and a reporting system leveraging the Canvas API. Lectures...

    Go to contribution page
  69. Melissa Bather (The University of Auckland)
    07/07/2026, 16:40
    Lightning Talk (5 minutes)

    Author information:

    Melissa Bather is a statistician from New Zealand, currently living in Vancouver, BC, Canada. She has a Master of Science in Statistics with First Class Honours from the University of Auckland—the birthplace of R! She is currently researching new methods to introduce multi-species models into the field of spatially explicit capture-recapture for the University of...

    Go to contribution page
  70. Ms Aleyna Erakcaoğlu (Department of Biostatistics, Erciyes University School of Medicine, Kayseri, Türkiye)
    07/07/2026, 16:45
    Lightning Talk (5 minutes)

    Apolipoprotein B (ApoB) is a key biomarker reflecting the number of atherogenic lipoprotein particles and is increasingly recognized as a superior indicator of cardiovascular risk compared to traditional lipid measures. However, direct ApoB measurement is not always routinely available in clinical practice, and existing tools do not provide an integrated framework for its estimation and...

    Go to contribution page
  71. Yasuto NAKANO (Kwansei Gakuin University)
    07/07/2026, 17:00
    Poster

    The purpose of this talk is to present md2qstn, a specialized R library developed to bridge the gap between plain-text survey drafting and digital deployment. md2qstn enables the conversion of Markdown-formatted questionnaires into DDI(Data Documentation Initiative)-compliant XML and Qualtrics-compatible QSF(Qualtrics Survey Format) JSON files. Although the prevailing approach in...

    Go to contribution page
  72. Isaac Gravestock (Roche)
    07/07/2026, 17:00
    Poster

    Google slides is a widely available productivity tool used by many institutions but is not well integrated with existing R ecosystem workflows such as Rmarkdown, which has made its use incompatible with reproducible research and reporting. ladder is an R package for inserting tables into Slides presentations and supports multiple table formats from R.

    In particular it supports flextable...

    Go to contribution page
  73. Prof. Richard Shefferson (University of Tokyo)
    07/07/2026, 17:00
    Poster

    Adaptive dynamics focuses on assessing the role of natural selection in trait evolution and speciation. Matrix community models allow population matrix models to project together via aggregated density dependence. I present a new R package, adapt3, that develops community matrix projection and adaptive dynamics using matrix approaches, with the core kernels all programmed in C++. In adaptive...

    Go to contribution page
  74. H. Sherry Zhang (University of Texas at Austin)
    07/07/2026, 17:00
    Poster

    Decision choices, such as those made when building regression models, and their rationale are essential for interpreting results and understanding uncertainty in an analysis. However, these decisions are rarely studied because tracing every alternatives considered by authors is often impractical, and reworking a completed analysis is generally of limited interest. Consequently, researchers...

    Go to contribution page
  75. shristi y (IIIT UNA)
    07/07/2026, 17:00
    Poster

    The widespread adoption of digital music streaming platforms has created unprecedented opportunities to analyze large-scale music consumption data. This study investigates global music trends by analyzing Spotify track data using statistical and visualization techniques implemented in R. The objective is to explore how various audio features—including danceability, energy, valence,...

    Go to contribution page
  76. Florian Sihler (Ulm University)
    07/07/2026, 17:00
    Poster

    In the past months, we built a tool to analyze all versions (roughly 170,000) of all packages available on CRAN, obtaining around 80 GB of raw data on various semantic aspects such as call graphs of functions, dead code, values of constants, the coverage of provided vignettes, transitive dependencies of packages, and much more. Moreover, the data is linked to the release date and...

    Go to contribution page
  77. Claudiu Forgaci (Delft University of Technology)
    07/07/2026, 17:00
    Poster

    Spatially designing and planning urban transformations around rivers while capturing the complexities of riverside urban areas remains challenging. An essential part of the challenge is how boundaries are drawn in the analysis of urban areas surrounding rivers. To overcome this challenge, we developed the rcrisp open-source R package to automate the morphological delineation of riverside...

    Go to contribution page
  78. Claudiu Forgaci (Delft University of Technology)
    07/07/2026, 17:00
    Poster

    The Spatial Data Science across Languages (SDSL) Community brings together developers and users of common and emerging programming languages for spatial data science. It aims to foster understanding and address common issues while discussing language-specific problems. We focus broadly on geospatial and geographic space, with some applications to general image spaces and local reference...

    Go to contribution page
  79. Daisuke Ichikawa (Kibaroku), Koji Makiyama (HOXO-M Inc.), Shinichi Takayanagi, kazuyuki sano
    07/07/2026, 17:00
    Poster

    Online A/B tests often randomize at the user level while evaluating ratio metrics at a finer-grained unit, such as page views or sessions. This mismatch induces within-user correlation and can make standard Z-tests anti-conservative, increasing false positives. The deltatest package provides an R interface for delta-method-based hypothesis testing of ratio metrics, following the practical...

    Go to contribution page
  80. Teresa Gonzalez-Arteaga (Universidad de Valladolod)
    07/07/2026, 17:00
    Poster

    Aggregation functions play a central role in decision making, and among them, weighted means and Ordered Weighted Averaging (OWA) operators are two of the most widely used families. Their relevance is reinforced by the fact that both can be expressed as particular cases of the Choquet integral, which has inspired numerous attempts to develop unified generalizations of these operators.
    ...

    Go to contribution page
  81. Mauro Loprete (Universidad de la República, Uruguay)
    07/07/2026, 17:00
    Poster

    Household survey microdata is a primary input for social science research and public policy evaluation, yet the processing pipelines that turn raw microdata into publishable estimates are rarely documented, shared, or reproduced. Each research team writes ad hoc scripts to recode variables, construct indicators, and compute weighted statistics, duplicating effort and introducing silent...

    Go to contribution page
  82. Dr Nicholas Spyrison (IFF (International Flavors and Fragrances))
    07/07/2026, 17:00
    Poster

    Industrial microbial production systems generate rich process data, yet translating these data into actionable parameter recommendations remains challenging. In this talk, we present a model based framework for generating and interpreting recommendations to optimize microbial production parameters using supervised machine learning.

    Using two distinct industrial probiotic strains (GG and...

    Go to contribution page
  83. Ernest Guevarra (nutriverse)
    07/07/2026, 17:00
    Poster

    nutriverse is an open source project, a collective, and a community of practice. nutriverse is an open source project developing robust, well-tested, and performant R packages for nutrition data analysis. The goal is to provide reliable tools that support the full lifecycle of nutrition analytics, from data ingestion and cleaning to statistical analysis, modelling, and reproducible...

    Go to contribution page
  84. Ward Langeraert (Research Institute for Nature and Forest)
    07/07/2026, 17:00
    Analysis best practices and workflows
    Poster

    Scaling research software beyond single scripts or standalone packages requires deliberate architectural choices, shared conventions, and robust distribution infrastructure. This poster presents the b3verse, a coordinated ecosystem of twelve interoperable R packages designed to transform large biodiversity occurrence cubes into standardized indicators for research and policy...

    Go to contribution page
  85. Dr Oscar de Leon (Universidad del Valle de Guatemala)
    07/07/2026, 17:00
    Poster

    Air pollution exposure research relies on a growing diversity of wearable personal exposure monitors (PEMs), each producing log files with distinct header structures, column naming conventions, and measurement units. The R ecosystem already offers strong infrastructure at adjacent layers for network-level data (openair and AirSensor), on-road vehicle emission systems (pems.utils), and...

    Go to contribution page
  86. Mr Marc Becker (Ludwig-Maximilians-Universität München)
    07/07/2026, 17:00
    Poster

    We present rush, an R package for asynchronous and decentralized optimization. Traditional approaches for parallel computing in R follow a controller-worker model where a central process proposes tasks, dispatches them to workers, and collects results. When proposing new tasks is computationally expensive, the central controller becomes a bottleneck that leaves workers idle, a problem that...

    Go to contribution page
  87. Vihan Singh (Indian Institute of Information Technology Una)
    07/07/2026, 17:00
    Poster

    The increasing availability of structured sports datasets has created new opportunities for applying statistical analysis and predictive modeling techniques to sports analytics. The Indian Premier League (IPL) provides detailed match and ball-by-ball datasets that allow in-depth statistical exploration of match dynamics and performance patterns. This study applies statistical analysis and...

    Go to contribution page
  88. Vedansh Bansal (Indian Institute of Information Technology Una)
    07/07/2026, 17:00
    Poster

    Student performance analysis is an important area in educational data science. This project focuses on building a Student Performance Analyzer to study and evaluate academic performance using collected data such as marks, attendance, and study hours. The objective is to use statistical analysis and data visualization techniques to understand patterns in student performance and support better...

    Go to contribution page
  89. Mohamed El Fodil Ihaddaden (HDI GLOBAL SE)
    07/07/2026, 17:00
    Poster

    Large Language Models (LLMs) introduce a fundamental challenge for software engineering in R: their non-deterministic behavior makes traditional unit testing inadequate. While identical prompts may yield slightly different outputs, robust validation of model behavior remains essential for production systems, research pipelines, and agent-based workflows.

    In this talk, I introduce mini007,...

    Go to contribution page
  90. Claudiu Forgaci (Delft University of Technology)
    07/07/2026, 17:00
    Poster

    The Rbanism community aims to empower urbanism researchers, students, educators and practitioners to use open-source software and related open-science practices effectively and with confidence. It raises awareness, stimulates engagement and builds capacity by demonstrating the benefits of reproducibility, automation and scalability. Rbanism was initiated in 2021 by a group of R users in the...

    Go to contribution page
  91. Hanna Meyer (University of Münster)
    07/07/2026, 17:00
    Poster

    One key task in environmental science is the continuous mapping of environmental variables across space, and often across both space and time. Machine learning algorithms are frequently employed for this purpose, combining local field observations with comprehensive sets of predictor variables to produce spatial predictions. This enables the prediction of the variable of interest at locations...

    Go to contribution page
  92. Jakub Nowosad
    08/07/2026, 09:00

    Abstract:

    Spatial data are central to understanding environmental change, social processes, and their interactions. Maps are not only visual products of analysis, but tools that shape how problems are framed, how phenomena are perceived, and how decisions are made. R has become a widely used environment for spatial data science because it combines interactive analysis, open development,...

    Go to contribution page
  93. Tim Appelhans (Friedrich-Alexander Universität Erlangen-Nürnberg, Institut für Geographie, Wetterkreuz 15, 91058 Erlangen, Germany)
    08/07/2026, 10:30
    Talks (15-20 minutes)

    When it comes to web-mapping in R, RStudio’s (Posit’s) 'leaflet' package has established itself as the de-facto standard. However, in addition to the lack of support for 3D data, 'leaflet' struggles with the rendering of large data (i.e. more than 1 million points/vertices) and data transfer from R memory to the browser can be slow. To overcome these bottlenecks, two packages, 'geoarrowWidget'...

    Go to contribution page
  94. Aleksi Lahtinen (University of Turku)
    08/07/2026, 10:30
    Analysis best practices and workflows
    Talks (15-20 minutes)

    In social science research, datasets are often confidential, requiring analyses to be conducted in secure remote access environments. One such environment is FIONA, which provides researchers with access to sensitive, unit-level Finnish register data alongside standard statistical software, including R. We use FIONA to analyse extensive register data on Finnish teenagers and young adults, with...

    Go to contribution page
  95. Kurt Hornik
    08/07/2026, 10:30
    Talks (15-20 minutes)

    Starting with R 4.6-0, the stats package provides infrastructure for distribution-free model-based inference in possibly stratified K-sample oneway layouts via the novel free1way model function. Treatment effects to be estimated using free1way include odds- and hazard ratios, Lehmann parameters, and a generalised version of Cohen's d for at least ordered and possibly right-censored...

    Go to contribution page
  96. Dr Lluís Revilla Sancho (Roche, Spain)
    08/07/2026, 10:30
    Talks (15-20 minutes)

    In data-intensive industries like pharmaceuticals, the ability to move seamlessly from population-level summaries to individual data points is critical for valid insight generation. However, enabling faster, more efficient exploratory analysis and regulatory delivery of clinical trials insights is challenging and difficult to scale. This talk introduces {teal}, recently released on CRAN with...

    Go to contribution page
  97. Natalia da Silva (Universidad de la República)
    08/07/2026, 10:50
    Talks (15-20 minutes)

    This talk presents enhancements to the projection pursuit tree classifier and visual diagnostic methods for assessing their impact in high dimensions. The original algorithm uses linear combinations of variables in a tree structure where depth is constrained to be less than the number of classes, a limitation that proves too rigid for complex classification problems. Our extensions improve...

    Go to contribution page
  98. Tom Elliott (iNZight Analytics Ltd)
    08/07/2026, 10:50
    Talks (15-20 minutes)

    Web front-ends offer seamless deployment across desktop and mobile platforms, with an ever growing variety of libraries enabling rich, interactive user experiences. However, integrating these modern web technologies with R's powerful analytic capabilities presents significant challenges.

    Rserve is an R library that enables two-way communication between R and other programming...

    Go to contribution page
  99. Gary Sutton
    08/07/2026, 10:50
    Talks (15-20 minutes)

    Many data scientists and other R users have no doubt been assigned to projects that quickly derailed—missed deadlines, scope creep, budget overruns—often because effective project management was assumed to be a "soft skill" anyone without deep technical expertise could handle. In reality, successful project delivery demands rigorous quantitative techniques: structured decomposition,...

    Go to contribution page
  100. Renata Hirota
    08/07/2026, 10:50
    Talks (15-20 minutes)

    Climate data has coordinates, but also powerful stories hiding in plain sight. In this talk, I’ll show the reasons why I feel in love with the R package {sf} while covering climate change in the Amazon as a freelance journalist. Every Last Drop is a project about oil exploration in the Amazon that showcases that spatial analysis doesn't...

    Go to contribution page
  101. Edzer Pebesma (University of Münster)
    08/07/2026, 11:10
    Talks (15-20 minutes)

    Areal units (polygons or pixels) are often used to summarize and distribute spatial data. In many spatial data science studies, datasets with different areal units need to be combined, and to do so first have to be transformed into a common set of areal units. This involves upscaling (going to a coarser resolution) or downscaling (going to a finer spatial resolution), or a combination. This...

    Go to contribution page
  102. Alyssa Columbus
    08/07/2026, 11:10
    Talks (15-20 minutes)

    Empirical data analysis often involves a large number of defensible analytic choices, including model specification, covariate selection, transformations, and approaches to missing data. These decisions can meaningfully influence statistical results, yet they are rarely explored or reported systematically. This talk presents a reproducible R-based workflow for examining analytic multiplicity...

    Go to contribution page
  103. Deepansh Khurana (Dimwit Labs)
    08/07/2026, 11:10
    Talks (15-20 minutes)

    In 2024, I talked about How I Built An API for My Life (and How You Can Too) which was a personal life tracker, called Hrafnagud, built using a slew of services but primarily with the support of R, namely, {plumber} and {shiny}. This was an API with a chunk of different endpoints for finance, travel, and more. The key focus was building the first iteration of such a system, and as the system...

    Go to contribution page
  104. Eileen Vattheuer (Statistics Austria)
    08/07/2026, 11:10
    Talks (15-20 minutes)

    Handling missing values in heterogeneous datasets remains a common challenge in applied data analysis. We introduce vimpute(), a new function in the R package VIM that provides a unified and model-adaptive framework for multivariate imputation. The function implements an iterative, variable-wise imputation procedure that adapts the modelling strategy to the type and characteristics of each...

    Go to contribution page
  105. Dr Filiz Karadag (Ege University), Dr Olgun Aydin (Gdansk University of Technology)
    08/07/2026, 11:30
    Talks (15-20 minutes)

    The subject of this presentation concerns the implementation of regularized regression and advanced parameter estimation procedures. The study demonstrates the practical application of these methods using three R packages developed by the authors: S-type.est, Styperidge.reg, and ridgregextra.

    Within this ecosystem, the S-type.est package provides the core estimation procedures for S-type...

    Go to contribution page
  106. Charlie Gao (Posit Software, PBC)
    08/07/2026, 11:30
    Talks (15-20 minutes)

    Collaborative data analysis presents a fundamental concurrency problem: when multiple users modify the same data simultaneously, how should conflicts be resolved? Traditional approaches rely on locking or central arbitration, but conflict-free replicated data types (CRDTs) offer a principled alternative. A CRDT is a data structure whose concurrent operations are guaranteed to converge to the...

    Go to contribution page
  107. Roger Bivand (Norwegian School of Economics)
    08/07/2026, 11:30
    Talks (15-20 minutes)

    When we represent or analyse spatial data, the position of observations matters. Where objects of interest are, also in relation to each other, and how position is measured, are referenced through coordinate reference systems (CRS), including units of measurement. Local CRS use arbitrary starting points, while standard CRS rely on tabulated values, for example the axis lengths of an ellipsoid...

    Go to contribution page
  108. Deepayan Sarkar (Indian Statistical Institute, Delhi Centre)
    08/07/2026, 11:30
    Talks (15-20 minutes)

    The National Health and Nutrition Examination Survey (NHANES) provides extensive public data on demographics, health, and nutrition, collected in two-year cycles since 1999. Although invaluable for epidemiological and health-related research, the complexity of NHANES data makes accessing, managing, and analyzing these datasets challenging. We present a reproducible computational environment...

    Go to contribution page
  109. Florian Sihler (Ulm University)
    08/07/2026, 11:50
    Lightning Talk (5 minutes)

    R provides a plethora of packages and features that support the dynamic and interactive exploration of data.
    Yet, there is a lack of static analysis tools which support program comprehension, reproducibility, and software engineering practices.
    With flowR we provide not just a static analysis framework, but also an easily accessible extension for common IDEs such as [Visual Studio...

    Go to contribution page
  110. Maciej Banas
    08/07/2026, 11:50
    Lightning Talk (5 minutes)

    The presentation will introduce jsplyr, a new and experimental R package providing a dplyr interface for lazy-evaluated data manipulation in JavaScript. The package is designed to offload heavy data calculations from the server to the web browser, making it particularly useful in shiny applications.

    The talk will begin with the rationale behind creating jsplyr and its current...

    Go to contribution page
  111. John Kloke (Northern Michigan University)
    08/07/2026, 11:50
    Lightning Talk (5 minutes)

    We discuss updates to Rfit - an R package for computation of rank-based procedures for general linear models (GLMs) including estimation of parameters, inference, and residual analysis. These estimates and the associated inference are robust to outliers in response space and highly efficient relative to traditional least squares methods.

    In this talk, we highlight some recent, ongoing, and...

    Go to contribution page
  112. Adam Struzik (Adam Mickiewicz University in Poznań)
    08/07/2026, 11:55
    Lightning Talk (5 minutes)

    In this paper, we present the automatedRecLin R package (available on CRAN), designed to perform record linkage based on an entropy-maximizing classifier. First, we briefly introduce the maximum entropy classification algorithm for record linkage, originally proposed by Lee et al. (2022, Surv. Methodol.), and describe an extension that allows for the use of continuous comparison functions....

    Go to contribution page
  113. Alfredo Hernandez Sanchez (Vilnius University)
    08/07/2026, 11:55
    Lightning Talk (5 minutes)

    This talk presents a practical workflow for taking a Shiny app from local development to a public production deployment using Google Cloud Run. Drawing on the deployment of a real dashboard, I show how to package a Shiny app in a container, deploy it as a managed web service, connect it to a custom domain, and update it through a lightweight GitHub based workflow.

    Rather than treating...

    Go to contribution page
  114. Alex K. Hagen (Potsdam Institute for Climate Impact Research (PIK))
    08/07/2026, 11:55
    Lightning Talk (5 minutes)

    Scenario analysis lets us explore ‘what-if’ futures for climate change, linking today’s decisions to long-term impacts. Systematic comparison of scenarios for transport – a key sector for emission reductions – can inform decision-making in policy and investments. Decarbonization of passenger and freight transport depends on mode shifts and the adoption of low-carbon technologies like electric...

    Go to contribution page
  115. Thierry Onkelinx (Research Institute for Nature and Forest (INBO))
    08/07/2026, 11:55
    Lightning Talk (5 minutes)

    The checklist R package has undergone significant evolution, with a major focus on flexible organisation management and improved quality control workflows.
    The most substantial change is the introduction of the org_list and org_item classes (v0.5.0), which supersede the previous organisation class.
    This redesign enables research groups to define and enforce their own institutional...

    Go to contribution page
  116. Florian Sihler (Ulm University)
    08/07/2026, 13:00
    Talks (15-20 minutes)

    Listing over 23,000 Packages on CRAN alone, the R ecosystem provides a plethora of general-purpose and domain-specific packages. While there is a strict quality standard for packages to be accepted on CRAN, there is little information available about the evolution and current state of the ecosystem.
    With the crawlR project, we statically analyzed all versions of all packages available on CRAN...

    Go to contribution page
  117. Benjamin Schwendinger (Fraunhofer Austria Research GmbH)
    08/07/2026, 13:00
    Talks (15-20 minutes)

    R’s copy-on-modify semantics make repeated growth and row-wise operations appear inherently expensive. Appending rows or incrementally building tabular structures typically triggers reallocation and copying. Recent versions of R now introduce an experimental resizable vector API, allowing vectors to be allocated with a maximum capacity and resized up to that capacity without...

    Go to contribution page
  118. Ivan Krylov (Lomonosov Moscow State University)
    08/07/2026, 13:00
    Talks (15-20 minutes)

    Generally, CRAN packages are expected to keep passing their tests, and their updates should not break other packages. Debugging the occasional failure of this process can lead the developer down a very deep rabbit hole: our dependency stacks are very deep, and none of the layers are completely bug-free.

    We're going to see how problems from real CRAN packages could be investigated and solved...

    Go to contribution page
  119. David Schruth (UW)
    08/07/2026, 13:00
    Lightning Talk (5 minutes)

    The visualization of data embodies a visceral fundament for understanding, and constitutes a cardinal mechanism for modern communication of any measurement outcomes of research—despite typically being neglected (largely for technical reasons) in traditional statistics literature. There are limitless ways of realizing plots of data, and visualizing even the most basic distribution of points...

    Go to contribution page
  120. Prakhar Srivastava (Indian Institute of Information Technology Una)
    08/07/2026, 13:00

    This project offers a data-driven look at India’s socio-economic growth using the R programming language. The goal is to examine how key indicators like population, literacy rate, GDP growth, and other development measures have changed over time in various regions of India. By using publicly available datasets, the project employs statistical analysis and visualization techniques in R to turn...

    Go to contribution page
  121. Gergely Daroczi
    08/07/2026, 13:20
    Talks (15-20 minutes)

    Selecting the right cloud instance type for model training or other compute-intensive R workloads is often a guesswork:

    1. Unclear resource requirements, e.g. "How much RAM do I need to train this hierarchical model?" or "Can my script scale to multiple CPU cores or even GPUs?"
    2. Pricing and hardware specs exist, but are fragmented across vendors and hard to compare, especially when real...
    Go to contribution page
  122. Carolina Mengoni Goñalons (Health Information and Statistics Office within the Ministry of Health of the City Buenos Aires)
    08/07/2026, 13:20
    Lightning Talk (5 minutes)

    Notes written by healthcare professionals within Electronic Health Records (EHR) are shaped by the specialty of the professional, the type of data to be depicted, the usability of the application, and the formats allowed by the system. Vital signs are mostly structured data that typically have their own dedicated entry section within an EHR. Yet, it is very common for healthcare professionals...

    Go to contribution page
  123. Dr Prince Sharma (Indian Institute of Information Technology Una), Rituraj Singh (Indian Institute of Information Technology Una)
    08/07/2026, 13:20

    Author Information
    Name: Rituraj Singh
    Affiliation: Indian Institute of Information Technology Una, Himachal Pradesh, India
    Programme: B.Tech Computer Science Engineering (Data Science)
    Email: 24519@iiitu.ac.in

    Title
    Do Some Stocks Always Beat Others? Exploring Global Stock Factors with R

    Primary Topic
    *Finance and Economics Applications of R / Data Analysis and...

    Go to contribution page
  124. Abigail Stamm (Minnesota Department of Health), Eric Kvale (Minnesota Department of Health)
    08/07/2026, 13:20
    Talks (15-20 minutes)

    Online dashboards use data visualizations to quickly convey information. Interactive elements like charts and data filters often display additional information not accessible by screen readers. Accessibility features improve the overall presentation and user experience by augmenting, amplifying, and enhancing the content so that users can interact with and understand the same content in...

    Go to contribution page
  125. Mark Padgham (rOpenSci)
    08/07/2026, 13:20
    Talks (15-20 minutes)

    CRAN represents and curates a complex software ecosystem of over 25,000 packages. This ecosystem constantly evolves as packages are submitted, updated, and archived. We analysed the development of CRAN over its entire lifetime, both in terms of package inter-dependencies and the internal structures of every version of every package. These analyses used the...

    Go to contribution page
  126. Antoine Fabri (cynkra)
    08/07/2026, 13:40
    Talks (15-20 minutes)

    Debugging in R often relies on manually inspecting intermediate results, usually by adding print statements to track the evolution of variables. The {boomer} package simplifies this process by automatically displaying these results in a readable and flexible way, without even modifying the code of the analyzed functions.

    It gets its name from the fact that it “explodes” a call into its...

    Go to contribution page
  127. Laia Domenech-Burin (Sovereign Tech Agency)
    08/07/2026, 13:40
    Talks (15-20 minutes)

    The R language provides essential infrastructure for statistical computing, research, and data science worldwide. Yet, the labour that sustains this infrastructure remains largely invisible: maintaining legacy upstream code, triaging complex systems-level bugs, and hardening build pipelines. This work underpins the reliability and security of the ecosystem, but remains difficult to measure,...

    Go to contribution page
  128. Abigail Stamm (Minnesota Department of Health), Eric Kvale (Minnesota Department of Health)
    08/07/2026, 13:40
    Lightning Talk (5 minutes)

    Access to realistic public health data for training, pipeline validation, and methods development is constrained by privacy regulations that restrict use of real patient records. We present two complementary open-source R packages that address this problem at different points along the synthetic data design spectrum.

    toysurveydata (Stamm, MDH) generates simple, customizable fake survey...

    Go to contribution page
  129. Dr Prince Sharma (Indian Institute of Information Technology Una)
    08/07/2026, 13:40

    The rapid growth of sports analytics has enabled researchers to use data-driven techniques to understand performance patterns and predict outcomes in competitive sports. Cricket, particularly the Indian Premier League (IPL), generates a large amount of structured match data that can be analyzed to study team strategies, player performance, and match dynamics. This project focuses on analyzing...

    Go to contribution page
  130. Henrik Bengtsson (University of California San Francisco (UCSF))
    08/07/2026, 14:00
    Talks (15-20 minutes)

    Ever felt like parallelizing your R code requires a complete rewrite? Transitioning from sequential code to parallel execution has traditionally meant dealing with fragmented, obscure, package-specific APIs that distract us from our main goals. Which packages and functions should I use, and what platforms should I support? There are a lot of upfront decisions to make, with many...

    Go to contribution page
  131. Michael Lydeamore (Department of Econometrics and Business Statistics, Monash University, Victoria, Australia)
    08/07/2026, 14:00
    Talks (15-20 minutes)

    Code golf—writing the shortest possible code to solve a problem—has emerged as an engaging method for teaching programming fundamentals. Its competitive, game-like structure fosters student motivation and encourages self-directed learning.

    Inspired by the success of CSSBattle, which attracts thousands of daily users with CSS challenges, I present ggplot battles: a new browser-based platform...

    Go to contribution page
  132. Dr Prince Sharma (Institute of Information Technology, Una)
    08/07/2026, 14:00

    This project presents the development of a movie recommendation system implemented using the R programming language. The primary objective of the project is to design a data-driven model capable of suggesting relevant movies to users based on patterns identified in historical rating data. R was chosen for this project due to its strong capabilities in statistical computing, data analysis, and...

    Go to contribution page
  133. Abdul Aziz Nurussadad (Badan Informasi Geospasial)
    08/07/2026, 14:00
    Lightning Talk (5 minutes)

    Scaling public satisfaction surveys presents a significant challenge for government hubs managing hundreds of distinct services. National regulations of Indonesia (Minister of Administrative and Bureaucratic Reform Regulation 14/2017) mandate measuring nine specific quality components—including staff behavior, requirements, and facilities—for every service provided. However, requiring citizens...

    Go to contribution page
  134. Maciej Romaniuk (Systems Research Institute, PAS)
    08/07/2026, 14:20
    Lightning Talk (5 minutes)

    Imputation methods are widely used to replace missing values in datasets, thereby improving the overall quality of samples and enabling further statistical procedures. Various measures and tools were proposed to compare the effectiveness and results of the imputation algorithms. However, they only aim at “crisp” (i.e., real-valued) datasets. Meanwhile, fuzzy sets are widely used to model...

    Go to contribution page
  135. Shuai Wu (MSD)
    08/07/2026, 14:20
    Lightning Talk (5 minutes)

    The development of R packages for Bayesian analysis is often slowed by the computationally intensive nature of MCMC sampling, which turns iterative testing into a major bottleneck. A recurring challenge in this domain is the trade-off between saving large fitted model objects to disk versus regenerating them on each test run, a question coming up repeatedly during local development, continuous...

    Go to contribution page
  136. Magnus Mengelbier (Independent / Freelancer)
    08/07/2026, 14:25
    Lightning Talk (5 minutes)

    The R language can be found on laptops, servers, clusters and high performance compute environments as well as embedded within databases, services, agents and business domain solutions. As the sheer number of analyses, more compute intensive analysis methods and the size of data steadily increases, using R for analysis at scale is becoming a math problem that seems to have no one simple...

    Go to contribution page
  137. Tina Rashid Jafari (Monash University, Australia)
    08/07/2026, 14:25
    Lightning Talk (5 minutes)
    • Author: Tina Rashid Jafari, Department of Econometrics and Business Statistics, Monash University, Australia, Email: tina.rashidjafari@monash.edu
    • Title: spinebil: Practical Diagnostics for Index Reliability in Exploratory Data Analysis
    • Primary Topic: Statistical Graphics/ Exploratory Data Analysis
    • Keywords: Exploratory Data Analysis; Data Visualization;...
    Go to contribution page
  138. Peter Dalgaard
    08/07/2026, 15:00

    Abstract:
    R grew out of the PC, Internet, and Open Source revolution in the 1990s. I will give an account of the early history of R, and then outline the development principles of R Core, with special focus on release management. However, the R Core Team is not alone in the governance of the R project, other major actors being the CRAN team, the R Foundation and the R Consortium. I discuss...

    Go to contribution page
  139. Victor Yu (Hertfordshire County Council, UK)
    08/07/2026, 16:00
    Poster

    This package allows the user to perform interrupted time series (ITS) with a control across successive interventions (up to 3). This code is based on a prior analysis done at our county where we compared the effect of two successive behavioural interventions designed in improving the uptake of a COVID-19 booster intervention programme amongst immunosuppressed patients at several primary care...

    Go to contribution page
  140. Dr Filip Křikava (Czech Technical University in Prague)
    08/07/2026, 16:00
    Poster

    Dynamic programming languages are increasingly adopting explicit type annotations. Not only do they serve as documentation, but they also enable static type checking to eliminate entire classes of bugs and help tools provide a better development experience. In this talk, we will present our advancements in bringing types to R, including a type system with a static type checker with type...

    Go to contribution page
  141. Anurag Yadav
    08/07/2026, 16:00
    Poster

    Artificial satellites play a critical role in modern communication, navigation, weather monitoring, and scientific research. Over the past decades, the number of satellites orbiting Earth has increased rapidly due to advancements in space technology and the growth of commercial satellite constellations. Understanding the distribution of satellites and their potential impacts is important for...

    Go to contribution page
  142. Patryk Kołbyko (Szkoła Doktorska Nauk Społecznych UMCS. Uniwersytet Marii Curie-Skłodowskiej w Lublinie)
    08/07/2026, 16:00
    Poster

    This study presents an end-to-end R-based workflow for estimating Poland’s natural rate of interest within a Bayesian vector error-correction setting. The empirical objective is to recover an equilibrium real interest rate and the associated monetary policy stance gap, whereas the methodological contribution lies in demonstrating how advanced macroeconometric inference can be structured,...

    Go to contribution page
  143. Ozancan Ozdemir (University of Groningen)
    08/07/2026, 16:00
    Poster

    The increasing complexity of financial markets demands analytical tools that combine real-time data access, rigorous statistical modelling, and intuitive visual communication within a single, reproducible framework. This study presents FinDash Pro, a production-grade interactive dashboard developed entirely in R using the Shiny ecosystem, designed to bridge the gap between...

    Go to contribution page
  144. Serra İlayda Yerlitaş Taştan (Department of Biostatistics, Erciyes University, Faculty of Medicine, 38030, Kayseri, Türkiye)
    08/07/2026, 16:00
    Poster

    Accurate diagnosis often requires the integration of multiple biomarkers rather than relying on a single test. However, existing tools for combining diagnostic tests are limited in methodological diversity and usability, especially for clinicians without programming expertise. To address this gap, we present dtComb-Shiny, a user-friendly web-based interface built on the dtComb R package. The...

    Go to contribution page
  145. Ms Daphne Grasselly (Roche), Magdalena Krochmal (Roche)
    08/07/2026, 16:00
    Poster

    Medical Data Review (MDR) in clinical trials requires study teams to examine patient-level data across dozens of CRF domains — adverse events, labs, vitals, ECGs, and more. Traditionally, this relies mainly on static listings generated per study, requiring extensive setup and line-by-line inspection. We present an R framework, built on teal, that replaces this workflow with interactive,...

    Go to contribution page
  146. Karolina Widzisz (Department of Computer Graphics, Vision and Digital Systems, Silesian University of Technology, Gliwice, Poland)
    08/07/2026, 16:00
    Poster

    We present a synthetic data generator for simulation studies in clustering and partition comparison. The generator creates datasets with controlled cluster structures and predefined similarity levels between alternative partitions, enabling systematic analysis of clustering algorithms' stability.

    The framework uses a Gaussian mixture distribution and generates data through a three-stage...

    Go to contribution page
  147. Laure Cougnaud (Open Analytics NV)
    08/07/2026, 16:00
    Poster

    The use of R packages in a regulated environment as in pharmaceutical companies might require a formal validation of the R package.

    The Validation Hub introduces best practices and insights from pharmaceutical industries for the validation of R packages for use within the biopharmaceutical regulatory setting.

    We will contribute to this effort by presenting a git-based workflow to...

    Go to contribution page
  148. Marcin Dubel (Appsilon)
    08/07/2026, 16:00
    Poster

    Building exploratory analysis dashboards for clinical trials requires considerable expertise, extensive time, and deep familiarity with specialized frameworks. In this talk, we share our GenAI solution to significantly streamline this process. We will present a tool, powered by Claude Code, that enables biostatisticians and clinical researchers to effortlessly create and immediately preview...

    Go to contribution page
  149. Adam Forys (Roche), Magdalena Krochmal (Roche)
    08/07/2026, 16:00
    Poster

    AI code assistants such as Claude Code, opencode, and Aider can read, write, and run code. However, they work separately from the user's R session. They cannot look at live objects, call R functions, or update a running Shiny application. We present a way to connect these assistants directly to R and Shiny using the Model Context Protocol (MCP).

    The main idea is to use CLI-based AI agents...

    Go to contribution page
  150. Winkle Lu
    08/07/2026, 16:00
    Poster

    Clinical trial data analysis typically focuses on specific analysis datasets, but the complete journey of individual patients — from screening, enrollment, and first dose, through visit records and adverse events, to last dose and survival status — represents critical time-based data points that reviewers prioritize. This fragmentation of information forces reviewers to switch between multiple...

    Go to contribution page
  151. Angelika Meraner (Statistics Austria)
    08/07/2026, 16:00
    Poster

    persephone3 is the updated R framework developed at Statistics Austria to enable efficient processing of large sets of time series in the production of seasonally adjusted estimates. It modernizes the original [persephone][1] package by moving from the RJDemetra backend to the new [rjd3 ecosystem][2], ensuring long term maintainability and compatibility with current JDemetra+...

    Go to contribution page
  152. Dr Jan Simson (LMU Munich)
    08/07/2026, 16:00
    Poster

    We present peRsian, an R package containing color palettes based on handcrafted Persian carpets for use in data visualization. peRsian is a tribute to centuries of Persian carpet-making, a craft that’s been alive for over two thousand years. It’s dedicated to the incredible artisans who’ve kept this tradition alive: especially the women who spent countless hours knotting and weaving every...

    Go to contribution page
  153. Khanh Do (Deakin University), Vedanti Padhye (Monash University)
    08/07/2026, 16:00
    Poster

    While R is famous for its usage as a statistics tool, it also has the potential to serve as a practical tool for day-to-day corporate operations. To unlock this practical value, it often relies on the ability to deliver complex mathematical backends through intuitive interfaces. In this presentation, we will explore the core principles of building enterprise-ready applications by showcasing a...

    Go to contribution page
  154. Yuki Yanai (Kochi University of Technology)
    08/07/2026, 16:00
    Poster

    We have developed rgamer, an R package for learning and applying game theory. The goal of rgamer is to support both teaching and learning by enabling students to explore game-theoretic concepts and instructors to demonstrate them effectively in R. The package not only solves standard models such as two-person normal-form games, but also provides visualizations that highlight key structural...

    Go to contribution page
  155. Joanna Zyla (Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland)
    08/07/2026, 16:00
    Poster

    Gaussian Mixture Modeling (GMM) is a one of unsupervised techniques used in many fields of data analysis, such as bioinformatics, pattern recognition, and network traffic analysis. Yet, existing R implementations often lack support for binned data (commonly observed in image analysis) and suffer from initialization instability or massive memory usage. To address these limitations, the novel R...

    Go to contribution page
  156. Daisuke Ichikawa (Kibaroku), Koji Makiyama (HOXO-M Inc.), Shinichi Takayanagi, kazuyuki sano
    08/07/2026, 16:00
    Poster

    TheseusPlot is an R package for explaining why a rate metric (e.g., conversion rate, retention rate, or on-time rate) differs between two groups, such as time periods, cohorts, or A/B variants. The package decomposes an overall difference into contributions from individual subgroups using a procedure inspired by the Ship of Theseus: starting from Group A, it replaces subgroup data with the...

    Go to contribution page
  157. Leila Kianmehr
    08/07/2026, 16:00
    Poster

    Autosomal Dominant Polycystic Kidney Disease (ADPKD), the most common hereditary kidney disease, exhibits marked clinical heterogeneity driven by complex molecular mechanisms. While single-omics studies identify isolated pathways, defining the coordinated mechanistic framework of disease remains a challenge. In this study, we present an end-to-end R-based workflow to integrate high-throughput...

    Go to contribution page
  158. Dr Wang Pok Lo (University of Oxford)
    08/07/2026, 16:00
    Poster

    Simulation studies allow comparisons of performance between statistical methods to be made. Tables are traditionally used to report study results, which are usually performance measures such as bias, empirical standard error, average model standard error and coverage. In large simulation studies, these tables of results may become too large for patterns to be readily identified. This occurs...

    Go to contribution page
  159. Kari L. Jordan
    09/07/2026, 09:00

    Abstract
    Every R script, package, and analysis rests on a foundation of community labor that often goes unseen. While R is widely celebrated for its technical power, its longevity depends just as much on the people who teach, maintain, translate, mentor, and organize around it.
    In this keynote, I will draw on my leadership at The Carpentries to share stories and insights about building and...

    Go to contribution page
  160. Maciej Nasiński (UCB & University of Warsaw)
    09/07/2026, 10:30
    Talks (15-20 minutes)

    By 2026, the era of vibe coding has made rapid prototyping effortless; however, it has also highlighted a significant gap between demos and production-quality systems. While AI agents can simulate agile processes, they often optimise for speed at the expense of architectural integrity, security, and long-term technical debt. Anyone can generate code, but not everyone can manage a project. The...

    Go to contribution page
  161. Mikael Jagan
    09/07/2026, 10:30
    Talks (15-20 minutes)

    We present the history, structure, and design philosophy of "recommended" R package Matrix, which extends R with classes and methods for structured matrices having sparse or dense storage. We review recent and ongoing development in Matrix, covering matrices with integer or complex data, matrix factorizations, and improved documentation. Finally, as the number of reverse dependencies of...

    Go to contribution page
  162. Christian Martinez (CUNY)
    09/07/2026, 10:30
    Talks (15-20 minutes)

    Reproducibility in R is often taught as a final requirement rather than as a workflow that evolves over time. In many research methods courses, students write one-off scripts against artificial datasets, submit them, and never return to their code. This talk presents an alternative: an end-to-end, R-native ecosystem designed to move students from code users to reproducible researchers—and...

    Go to contribution page
  163. Mx Katrina Brock (Max Planck Institute of Animal Behavior)
    09/07/2026, 10:30
    Talks (15-20 minutes)

    The behavior ecology literature offers a rich library of approaches for finding patterns in animal movement data. While many of the biologists developing these algorithms publish their code, it is rarely optimized for reuse. Even well-designed packages with similar workflows have different interfaces that potential users need to learn one by one. By wrapping these algorithms in a standardized...

    Go to contribution page
  164. Ella Kaye (University of Birmingham)
    09/07/2026, 10:50
    Talks (15-20 minutes)

    The throw sequence "423" is a valid pattern for juggling with three balls, but "432" will result in collisions and dropped balls. How can you tell? All juggling patterns can be described in a notation called siteswap, and siteswap sequences can be mathematically validated and visualised.

    In this talk, I introduce jugglr, an R package for working with siteswap sequences. It validates...

    Go to contribution page
  165. David Granjon (cynkra GmbH)
    09/07/2026, 10:50
    Talks (15-20 minutes)

    We are delighted to introduce 'blockr', a visual, block-based interface for building, customising, and sharing interactive R data workflows, without any coding experience.

    'blockr' enables users to snap together modular blocks for data loading, transformation, visualisation, and export, forming directed acyclic graph (DAG) pipelines with instant visual feedback. It carefully integrates AI...

    Go to contribution page
  166. Aymeric Stamm (Department of Mathematics Jean Leray, UMR CNRS 6629, Nantes University)
    09/07/2026, 10:50
    Talks (15-20 minutes)

    Topological data analysis (TDA) is an emerging area of statistical research grounded in topology, intersecting with exploratory analysis, statistical inference, and machine learning. It is therefore important for R users to have access to comprehensive and reliable TDA tools.

    Published R packages for TDA fall into three categories: First, {TDA} and {rgudhi} interface with comprehensive...

    Go to contribution page
  167. Janith Wanniarachchi (Monash University)
    09/07/2026, 10:50
    Talks (15-20 minutes)

    Understanding the behaviour of complex machine learning models has become a challenge in the modern day. Explainable AI (XAI) methods were introduced to provide insights into model predictions, however explaining these explanations can be difficult without proper visualisation methods. In addition, settling the disagreements between these explainers can be difficult based purely on numerical...

    Go to contribution page
  168. Balasubramanian Narasimhan (Stanford University)
    09/07/2026, 11:10
    Talks (15-20 minutes)

    CVXR is the R implementation of CVXPY, a widely-used disciplined convex optimization framework. Maintained by two developers, the S4-based CVXR 1.0 had fallen significantly behind CVXPY in features. We report on a complete rewrite using S7 We report on a complete rewrite using S7, that is now on CRAN that targets current version of CVXPY. The new version is 4-5x faster than old CVXR and the...

    Go to contribution page
  169. Gero Szepannek (Stralsund university of Applied Sciences)
    09/07/2026, 11:10
    Talks (15-20 minutes)

    The package clustMixType [3] is one of the most popular packages for clustering of mixed-type data. Nonethless, an open issue not only for clustering mixed-type data but also for clustering in general is an appropriate weighting of the variables. In Huang’s original paper [1] as well as in the clustMixType package only heuristics are given for this purpose. In the presentation it will be...

    Go to contribution page
  170. Jakub Grzywaczewski (Warsaw University of Technology), Dr Nuno Sepúlveda (Warsaw University of Technology)
    09/07/2026, 11:10
    Talks (15-20 minutes)

    Pre-processing and quality control of high-dimensional serological data from Multiplex Bead Assay machines pose a significant bottleneck to the responsible application of machine learning to global health challenges. Driven by the data demands of the PvSTATEM project, an international initiative aimed at malaria elimination, we developed SerolyzeR, an open-source R package designed to...

    Go to contribution page
  171. Efstathios Gennatas (UCSF)
    09/07/2026, 11:30
    Talks (15-20 minutes)

    Basic research and clinical medicine are increasingly capitalizing on data-driven approaches to derive insights into disease pathophysiology and discover new therapeutic targets. While advanced algorithms are readily available, their application requires a combination of domain, quantitative, and technical expertise, leaving them out of reach for many domain expert researchers and clinicians....

    Go to contribution page
  172. Thomas Petzoldt (TUD - Dresden University of Technology)
    09/07/2026, 11:30
    Talks (15-20 minutes)

    Quantitative modeling is essential in the life and environmental sciences, yet students often face significant barriers due to "math anxiety" and programming complexity. While differential equations are often perceived as dry or difficult, interactive simulations offer a playful entry point that fosters intuitive understanding. However, traditional "downloadable models" often suffer from...

    Go to contribution page
  173. Mitchell O'Hara-Wild (Monash University)
    09/07/2026, 11:30
    Talks (15-20 minutes)

    Statistical analysis on temporal, spatial, graph, and probabilistic data is error-prone when the data types lack intrinsic structure. Outputs from models typically return these composite data types separately, requiring the user to assemble and apply the results correctly. This reduces the accessibility of statistics and results in error-prone analysis. Representing these data types using...

    Go to contribution page
  174. Aymeric Stamm (Department of Mathematics Jean Leray, UMR CNRS 6629, Nantes University)
    09/07/2026, 11:30
    Talks (15-20 minutes)

    Diffusion magnetic resonance imaging (MRI) is a non-invasive imaging technique that allows us to probe the microstructure in the brain at a mesoscopic scale by making the MR signal sensitive to the diffusion of water molecules in the brain, which is restricted or hindered by cellular structures such as axons or glial cells. Diffusion MRI suffers from a poor spatial resolution, which yields the...

    Go to contribution page
  175. Seun Olufemi (Bioinformatics Outreach Nigeria)
    09/07/2026, 11:50
    Lightning Talk (5 minutes)

    Computational literacy remains a critical gap among life scientists in sub-Saharan Africa, limiting their contribution to global science and competitiveness in data-driven research careers. Bioinformatics Outreach Nigeria (BON) organised a 3-day intensive R training for life scientists, covering base R, data types and structures, data cleaning, tidyverse-based manipulation, and ggplot2...

    Go to contribution page
  176. Daniele Girolimetto (Department of Statistical Sciences, University of Padova)
    09/07/2026, 11:50
    Lightning Talk (5 minutes)

    Forecast reconciliation has become key to improving the accuracy and coherence of forecasts for linearly constrained multiple time series, such as hierarchical and grouped series. Yet, comprehensive software that jointly covers cross-sectional, temporal, and cross-temporal reconciliation has so far been lacking. The R packages FoReco and FoRecoML address this gap by offering a comprehensive...

    Go to contribution page
  177. Tomasz Kalinowski (Posit PBC)
    09/07/2026, 11:50
    Lightning Talk (5 minutes)

    R is an interactive environment. Working effectively with R means being able to interact with a live session to inspect objects, view and iterate on plots, access help, and step through running code in the debugger. Making LLMs effective in R therefore means more than giving them a way to execute code: it means exposing R’s interactive affordances in a form the model can use.

    This talk...

    Go to contribution page
  178. Dr Håvard R. Karlsen (NTNU)
    09/07/2026, 11:50
    Lightning Talk (5 minutes)

    Traditional presentation software like PowerPoint or Keynote are commonly used for teaching, but not ideal for displaying and running code, as it involves a lot of copying and pasting. Moving to presenting Quarto documents makes it much easier to incorporate code and output in the presentation. But it can be daunting as it involves learning a new framework for creating and presenting slides....

    Go to contribution page
  179. Martin Binder (Department of Statistics, LMU Munich)
    09/07/2026, 11:55
    Lightning Talk (5 minutes)

    To explore the behaviour of expensive black-box functions, such as machine learning model evaluations or physical simulations, it is often useful to fit a surrogate regression model to a sequence of evaluated points. Choosing these points adaptively, rather than relying on a pre-specified design, is advantageous because it places greater emphasis on regions of the configuration space where the...

    Go to contribution page
  180. Henrik Bengtsson (Futureverse.org, R Foundation, R Consortium, Bioconductor, R anno 2000, University of California San Francisco (UCSF))
    09/07/2026, 11:55
    Lightning Talk (5 minutes)

    The ability to execute arbitrary R code securely is becoming increasingly critical, e.g., for use cases ranging from AI agents executing LLM-generated code to peer-to-peer (P2P) compute clusters. Sandboxing techniques such as virtual machines and Linux containers are commonly used to isolate the host machine from untrusted code. Because these technologies can be complicated to set up, and...

    Go to contribution page
  181. Aleksander Jankowski (University of Warsaw)
    09/07/2026, 11:55
    Lightning Talk (5 minutes)

    The identification of overrepresented Gene Ontology (GO) terms in a set of genes is a standard approach to obtain functional associations, e.g. to characterize the set of differentially expressed genes between treatment and control samples. Here, we present the R package GO-a-GO that annotates Gene Ontology terms that are enriched in a given set of gene pairs. This provides the opportunity to...

    Go to contribution page
  182. Marta Bernardi (TU Dresden), Sarah Listabarth (TU Dresden)
    09/07/2026, 11:55
    Lightning Talk (5 minutes)

    Our first session doesn't start with R. It starts with asking students to unzip a folder. Every year, some of them can't.
    This is the reality of teaching empirical economics today. Students arrive having grown up on tablets and smartphones, fluent in apps but lost in a file system. Getting them to difference-in-differences feels, at first, impossibly far away.
    And yet, that's exactly what we...

    Go to contribution page
  183. Sina Chen (GESIS Leibniz Institute for the Social Sciences)
    Building tools for reproducible research
    Lightning Talk (5 minutes)

    The philosophy of the R package paperboy is that the package is a repository for webscraping scripts for news media sites, with advanced features for quick data retrieval - even for content behind log-ins or anti-scraping measures. Many data scientists and researchers write their own code when they have to retrieve news media content from websites. At the end of research projects, this code is...

    Go to contribution page