Related papers: LLMs can construct powerful representations and st…

FABRIC: Framework for Agent-Based Realistic Intelligence Creation

Large language models (LLMs) are increasingly deployed as agents, expected to decompose goals, invoke tools, and verify results in dynamic environments. Realizing these capabilities requires access to agentic data-structured interaction…

Artificial Intelligence · Computer Science 2025-10-22 Abhigya Verma , Seganrasan Subramanian , Nandhakumar Kandasamy , Naman Gupta

Learnable Assessment Skills for LLM-based Automated Scoring: Rubric Construction via Iterative Optimization

LLM-based automated scoring approaches near-human performance, but scaling to new tasks remains bottlenecked by the per-item human configuration of upstream stages such as rubric construction. Human experts bypass this bottleneck through…

Computation and Language · Computer Science 2026-05-29 Yun Wang , Xin Xia , Xuansheng Wu , Xiaoming Zhai , Ninghao Liu

LLM-Rubric: A Multidimensional, Calibrated Approach to Automated Evaluation of Natural Language Texts

This paper introduces a framework for the automated evaluation of natural language texts. A manually constructed rubric describes how to assess multiple dimensions of interest. To evaluate a text, a large language model (LLM) is prompted…

Computation and Language · Computer Science 2025-01-03 Helia Hashemi , Jason Eisner , Corby Rosset , Benjamin Van Durme , Chris Kedzie

Automated Rubrics for Reliable Evaluation of Medical Dialogue Systems

Large Language Models (LLMs) are increasingly used for clinical decision support, where hallucinations and unsafe suggestions may pose direct risks to patient safety. These risks are hard to assess: subtle clinical errors are often missed…

Computation and Language · Computer Science 2026-05-14 Yinzhu Chen , Abdine Maiga , Hossein A. Rahmani , Emine Yilmaz

Leveraging LLMs for Semi-Automatic Corpus Filtration in Systematic Literature Reviews

The creation of systematic literature reviews (SLR) is critical for analyzing the landscape of a research field and guiding future research directions. However, retrieving and filtering the literature corpus for an SLR is highly…

Machine Learning · Computer Science 2026-02-18 Lucas Joos , Daniel A. Keim , Maximilian T. Fischer

RubricRAG: Towards Interpretable and Reliable LLM Evaluation via Domain Knowledge Retrieval for Rubric Generation

Large language models (LLMs) are increasingly evaluated and sometimes trained using automated graders such as LLM-as-judges that output scalar scores or preferences. While convenient, these approaches are often opaque: a single score rarely…

Information Retrieval · Computer Science 2026-03-24 Kaustubh D. Dhole , Eugene Agichtein

Explainable Iterative Data Visualisation Refinement via an LLM Agent

Exploratory analysis of high-dimensional data relies on embedding the data into a low-dimensional space (typically 2D or 3D), based on which visualization plot is produced to uncover meaningful structures and to communicate geometric and…

Human-Computer Interaction · Computer Science 2026-04-23 Burak Susam , Tingting Mu

Concept-based Rubrics Improve LLM Formative Assessment and Data Synthesis

Formative assessment in STEM topics aims to promote student learning by identifying students' current understanding, thus targeting how to promote further learning. Previous studies suggest that the assessment performance of current…

Machine Learning · Computer Science 2025-04-08 Yuchen Wei , Dennis Pearl , Matthew Beckman , Rebecca J. Passonneau

A New Pipeline For Generating Instruction Dataset via RAG and Self Fine-Tuning

With the rapid development of large language models in recent years, there has been an increasing demand for domain-specific Agents that can cater to the unique needs of enterprises and organizations. Unlike general models, which strive for…

Computation and Language · Computer Science 2024-08-13 Chih-Wei Song , Yu-Kai Lee , Yin-Te Tsai

Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability

High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality,…

Machine Learning · Computer Science 2025-06-04 Genta Indra Winata , David Anugraha , Emmy Liu , Alham Fikri Aji , Shou-Yi Hung , Aditya Parashar , Patrick Amadeus Irawan , Ruochen Zhang , Zheng-Xin Yong , Jan Christian Blaise Cruz , Niklas Muennighoff , Seungone Kim , Hanyang Zhao , Sudipta Kar , Kezia Erina Suryoraharjo , M. Farid Adilazuarda , En-Shiun Annie Lee , Ayu Purwarianti , Derry Tanti Wijaya , Monojit Choudhury

Data-to-Dashboard: Multi-Agent LLM Framework for Insightful Visualization in Enterprise Analytics

The rapid advancement of LLMs has led to the creation of diverse agentic systems in data analysis, utilizing LLMs' capabilities to improve insight generation and visualization. In this paper, we present an agentic system that automates the…

Artificial Intelligence · Computer Science 2025-05-30 Ran Zhang , Mohannad Elhamod

ScribeAgent: Towards Specialized Web Agents Using Production-Scale Workflow Data

Large Language Model (LLM) agents are rapidly improving to handle increasingly complex web-based tasks. Most of these agents rely on general-purpose, proprietary models like GPT-4 and focus on designing better prompts to improve their…

Computation and Language · Computer Science 2024-12-06 Junhong Shen , Atishay Jain , Zedian Xiao , Ishan Amlekar , Mouad Hadji , Aaron Podolny , Ameet Talwalkar

TimeSeriesExamAgent: Creating Time Series Reasoning Benchmarks at Scale

Large Language Models (LLMs) have shown promising performance in time series modeling tasks, but do they truly understand time series data? While multiple benchmarks have been proposed to answer this fundamental question, most are manually…

Artificial Intelligence · Computer Science 2026-04-15 Malgorzata Gwiazda , Yifu Cai , Mononito Goswami , Arjun Choudhry , Artur Dubrawski

Compiling Prompts, Not Crafting Them: A Reproducible Workflow for AI-Assisted Evidence Synthesis

Large language models (LLMs) offer significant potential to accelerate systematic literature reviews (SLRs), yet current approaches often rely on brittle, manually crafted prompts that compromise reliability and reproducibility. This…

Computation and Language · Computer Science 2025-09-03 Teo Susnjak

Optimsyn: Influence-Guided Rubrics Optimization for Synthetic Data Generation

Large language models (LLMs) achieve strong downstream performance largely due to abundant supervised fine-tuning (SFT) data. However, high-quality SFT data in knowledge-intensive domains such as humanities, social sciences, medicine, law,…

Computation and Language · Computer Science 2026-04-02 Zhiting Fan , Ruizhe Chen , Tianxiang Hu , Ru Peng , Zenan Huang , Haokai Xu , Yixin Chen , Jian Wu , Junbo Zhao , Zuozhu Liu

Automatic Construction of Clinical Scoring Systems with LLM Agents

Modern clinical practice relies on evidence-based guidelines implemented as compact scoring systems composed of a small number of interpretable decision rules. While machine-learning models achieve strong performance, many fail to translate…

Machine Learning · Computer Science 2026-05-25 Silas Ruhrberg Estévez , Christopher Chiu , Mihaela van der Schaar

Story Ribbons: Reimagining Storyline Visualizations with Large Language Models

Analyzing literature involves tracking interactions between characters, locations, and themes. Visualization has the potential to facilitate the mapping and analysis of these complex relationships, but capturing structured information from…

Human-Computer Interaction · Computer Science 2025-08-12 Catherine Yeh , Tara Menon , Robin Singh Arya , Helen He , Moira Weigel , Fernanda Viégas , Martin Wattenberg

An Agentic Framework for Autonomous Materials Computation

Large Language Models (LLMs) have emerged as powerful tools for accelerating scientific discovery, yet their static knowledge and hallucination issues hinder autonomous research applications. Recent advances integrate LLMs into agentic…

Artificial Intelligence · Computer Science 2025-12-23 Zeyu Xia , Jinzhe Ma , Congjie Zheng , Shufei Zhang , Yuqiang Li , Hang Su , P. Hu , Changshui Zhang , Xingao Gong , Wanli Ouyang , Lei Bai , Dongzhan Zhou , Mao Su

AgenticSum: An Agentic Inference-Time Framework for Faithful Clinical Text Summarization

Large language models (LLMs) offer substantial promise for automating clinical text summarization, yet maintaining factual consistency remains challenging due to the length, noise, and heterogeneity of clinical documentation. We present…

Computation and Language · Computer Science 2026-02-24 Fahmida Liza Piya , Rahmatollah Beheshti

Agentic AutoSurvey: Let LLMs Survey LLMs

The exponential growth of scientific literature poses unprecedented challenges for researchers attempting to synthesize knowledge across rapidly evolving fields. We present \textbf{Agentic AutoSurvey}, a multi-agent framework for automated…

Information Retrieval · Computer Science 2025-09-24 Yixin Liu , Yonghui Wu , Denghui Zhang , Lichao Sun