6–9 Jul 2026
Europe/Warsaw timezone

Testing LLMs in R: The mini007 Package and the “LLM-as-a-Judge” Framework

7 Jul 2026, 17:00
2h
Poster Poster

Speaker

Mohamed El Fodil Ihaddaden (HDI GLOBAL SE)

Description

Large Language Models (LLMs) introduce a fundamental challenge for software engineering in R: their non-deterministic behavior makes traditional unit testing inadequate. While identical prompts may yield slightly different outputs, robust validation of model behavior remains essential for production systems, research pipelines, and agent-based workflows.

In this talk, I introduce mini007, an R package designed to test LLM-generated responses using the “LLM-as-a-Judge” principle. Instead of asserting exact string equality, mini007 delegates evaluation to a secondary LLM agent that scores responses against explicit validation criteria. This approach enables probabilistic yet reproducible quality control within standard R testing workflows.

The package integrates seamlessly with the testthat framework, allowing developers to embed LLM validation directly into automated test suites. Using the Agent$validate_response() method, developers can define prompts, expected criteria, and acceptance thresholds. Importantly, the model generating the response and the model evaluating it can differ, enabling cross-model validation strategies through the ellmer ecosystem.

Through practical examples, I demonstrate how this framework makes LLM-driven systems testable, monitorable, and maintainable in R. The session will cover design principles, limitations, and best practices for testing stochastic AI systems.

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

No AI tools/services were used.

Keywords: Please list up to 5 keywords to help us find the right session for your contribution. LLM, testing, reproducibility, testthat, agents
Virtual Option This submission is for onsite presentation only
Video Recording Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. Confirm

Author

Presentation materials

There are no materials yet.