6–9 Jul 2026
Europe/Warsaw timezone

An LLM-based Pipeline for Understanding Decision Choices in Data Analysis from Published Literature

7 Jul 2026, 17:00
2h
Poster Poster

Speaker

H. Sherry Zhang (University of Texas at Austin)

Description

Decision choices, such as those made when building regression models, and their rationale are essential for interpreting results and understanding uncertainty in an analysis. However, these decisions are rarely studied because tracing every alternatives considered by authors is often impractical, and reworking a completed analysis is generally of limited interest. Consequently, researchers must manually review large bodies of published analyses to identify common choices and understand how choices are made. In this work, we propose a workflow to automatically extract analytic decisions and their reasons from published literature using Large Language Models. Our method also introduces a paper similarity measure based on decision similarity and visualization methods using clustering algorithms. As an example, this workflow is applied to analyses studying the effect of particulate matter on mortality. This approach enables scalable and automated studies of decision choices in applied data analysis, providing an alternative to existing qualitative and interview-based studies.

If you used AI tools or services to support the preparation of this submission, please state the name and reason for using each of them.

Gemini for polishing the transcript

Additional Material or Paper

https://github.com/huizezhang-sherry/paper-decisions

Keywords: Please list up to 5 keywords to help us find the right session for your contribution. data analysis workflow, LLM, decision choices
Virtual Option This submission is for onsite presentation only
Video Recording Video sharing is fine
The author(s) agree(s) to take responsibility and be accountable for the contents of the submission and is/are authorized to present it. Confirm
Interested in serving as reviewer? Sure!

Author

H. Sherry Zhang (University of Texas at Austin)

Co-author

Dr Roger Peng (University of Texas at Austin)

Presentation materials

There are no materials yet.