Related papers: Atomizer: An LLM-based Collaborative Multi-Agent F…

LLM-Driven Collaborative Model for Untangling Commits via Explicit and Implicit Dependency Reasoning

Atomic commits, which address a single development concern, are a best practice in software development. In practice, however, developers often produce tangled commits that mix unrelated changes, complicating code review and maintenance.…

Artificial Intelligence · Computer Science 2025-11-06 Bo Hou , Xin Tan , Kai Zheng , Fang Liu , Yinghao Zhu , Li Zhang

AtomR: Atomic Operator-Empowered Large Language Models for Heterogeneous Knowledge Reasoning

Despite the outstanding capabilities of large language models (LLMs), knowledge-intensive reasoning still remains a challenging task due to LLMs' limitations in compositional reasoning and the hallucination problem. A prevalent solution is…

Computation and Language · Computer Science 2025-09-29 Amy Xin , Jinxin Liu , Zijun Yao , Zhicheng Lee , Shulin Cao , Lei Hou , Juanzi Li

Scaling Coding Agents via Atomic Skills

Current LLM coding agents are predominantly trained on composite benchmarks (e.g., bug fixing), which often leads to task-specific overfitting and limited generalization. To address this, we propose a novel scaling paradigm that shifts the…

Software Engineering · Computer Science 2026-04-28 Yingwei Ma , Yue Liu , Xinlong Yang , Yanhao Li , Kelin Fu , Yibo Miao , Yuchong Xie , Zhexu Wang , Shing-Chi Cheung

Detecting Multiple Semantic Concerns in Tangled Code Commits

Code commits in a version control system (e.g., Git) should be atomic, i.e., focused on a single goal, such as adding a feature or fixing a bug. In practice, however, developers often bundle multiple concerns into tangled commits, obscuring…

Software Engineering · Computer Science 2026-01-30 Beomsu Koh , Neil Walkinshaw , Donghwan Shin

Beyond Description: A Multimodal Agent Framework for Insightful Chart Summarization

Chart summarization is crucial for enhancing data accessibility and the efficient consumption of information. However, existing methods, including those with Multimodal Large Language Models (MLLMs), primarily focus on low-level data…

Artificial Intelligence · Computer Science 2026-02-24 Yuhang Bai , Yujuan Ding , Shanru Lin , Wenqi Fan

ComAgent: Multi-LLM based Agentic AI Empowered Intelligent Wireless Networks

Emerging 6G networks rely on complex cross-layer optimization, yet manually translating high-level intents into mathematical formulations remains a bottleneck. While Large Language Models (LLMs) offer promise, monolithic approaches often…

Artificial Intelligence · Computer Science 2026-01-28 Haoyun Li , Ming Xiao , Kezhi Wang , Robert Schober , Dong In Kim , Yong Liang Guan

Towards Compositional Generalization in LLMs for Smart Contract Security: A Case Study on Reentrancy Vulnerabilities

Large language models (LLMs) demonstrate remarkable capabilities in natural language understanding and generation. Despite being trained on large-scale, high-quality data, LLMs still fail to outperform traditional static analysis tools in…

Cryptography and Security · Computer Science 2026-01-13 Ying Zhou , Jiacheng Wei , Yu Qi , Faguo Wu , Xiao Zhang

Harnessing AtomisticSkills for Agentic Atomistic Research

Computational materials science and chemistry span vast knowledge domains and fractured software ecosystems. Although large language models (LLMs) have demonstrated research capabilities, scaling monolithic agents to manage the rigor and…

Chemical Physics · Physics 2026-05-26 Bowen Deng , Bohan Li , Matthew Cox , Hoje Chun , Juno Nam , Artur Lyssenko , Sathya Edamadaka , Jurgis Ruza , Xiaochen Du , Nofit Segal , Jesus Diaz Sanchez , Mingrou Xie , Ty Perez , Yu Yao , Miguel Steiner , Sauradeep Majumdar , Charles B. Musgrave , Anirban Chandra , Abhirup Patra , Detlef Hohl , Connor W. Coley , Ju Li , Rafael Gómez-Bombarelli

AI4Contracts: LLM & RAG-Powered Encoding of Financial Derivative Contracts

Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) are reshaping how AI systems extract and organize information from unstructured text. A key challenge is designing AI methods that can incrementally extract, structure,…

Information Retrieval · Computer Science 2025-06-03 Maruf Ahmed Mridul , Ian Sloyan , Aparna Gupta , Oshani Seneviratne

Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools

We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Agentic Reasoning dynamically leverages web search, code execution, and structured memory to address…

Artificial Intelligence · Computer Science 2025-07-16 Junde Wu , Jiayuan Zhu , Yuyuan Liu , Min Xu , Yueming Jin

LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery

Issue-to-commit link recovery in software repositories is fundamental to software traceability and project management, yet it remains a challenging task. Prior studies show that only about 42.2% of issues on GitHub are correctly linked to…

Software Engineering · Computer Science 2026-05-06 Arshia Akhavan , Alireza Hoseinpour , Abbas Heydarnoori , Hamid Bagheri , Mehdi Keshani

LLMC: Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Recent advancements in large language models (LLMs) are propelling us toward artificial general intelligence with their remarkable emergent abilities and reasoning capabilities. However, the substantial computational and memory requirements…

Machine Learning · Computer Science 2024-10-10 Ruihao Gong , Yang Yong , Shiqiao Gu , Yushi Huang , Chengtao Lv , Yunchen Zhang , Xianglong Liu , Dacheng Tao

An Auditable Agent Platform For Automated Molecular Optimisation

Drug discovery frequently loses momentum when data, expertise, and tools are scattered, slowing design cycles. To shorten this loop we built a hierarchical, tool using agent framework that automates molecular optimisation. A Principal…

Machine Learning · Computer Science 2025-08-06 Atabey Ünlü , Phil Rohr , Ahmet Celebi

CODECLEANER: Elevating Standards with A Robust Data Contamination Mitigation Toolkit

Data contamination presents a critical barrier preventing widespread industrial adoption of advanced software engineering techniques that leverage code language models (CLMs). This phenomenon occurs when evaluation data inadvertently…

Software Engineering · Computer Science 2024-11-19 Jialun Cao , Songqiang Chen , Wuqi Zhang , Hau Ching Lo , Shing-Chi Cheung

Scalable and Accurate Graph Reasoning with LLM-based Multi-Agents

Recent research has explored the use of Large Language Models (LLMs) for tackling complex graph reasoning tasks. However, due to the intricacies of graph structures and the inherent limitations of LLMs in handling long text, current…

Artificial Intelligence · Computer Science 2025-11-26 Yuwei Hu , Runlin Lei , Xinyi Huang , Zhewei Wei , Yongchao Liu

CoDA: Agentic Systems for Collaborative Data Visualization

Deep research has revolutionized data analysis, yet data scientists still devote substantial time to manually crafting visualizations, highlighting the need for robust automation from natural language queries. However, current systems…

Artificial Intelligence · Computer Science 2025-10-06 Zichen Chen , Jiefeng Chen , Sercan Ö. Arik , Misha Sra , Tomas Pfister , Jinsung Yoon

TableZoomer: A Collaborative Agent Framework for Large-scale Table Question Answering

While large language models (LLMs) have shown promise in the table question answering (TQA) task through prompt engineering, they face challenges in industrial applications, including structural heterogeneity, difficulties in target data…

Computation and Language · Computer Science 2025-09-03 Sishi Xiong , Ziyang He , Zhongjiang He , Yu Zhao , Changzai Pan , Jie Zhang , Zhenhe Wu , Shuangyong Song , Yongxiang Li

Learning Composable Chains-of-Thought

A common approach for teaching large language models (LLMs) to reason is to train on chain-of-thought (CoT) traces of in-distribution reasoning problems, but such annotated data is costly to obtain for every problem of interest. We want…

Computation and Language · Computer Science 2025-05-29 Fangcong Yin , Zeyu Leo Liu , Liu Leqi , Xi Ye , Greg Durrett

The Art of Breaking Words: Rethinking Multilingual Tokenizer Design

While model architecture and training objectives are well-studied, tokenization, particularly in multilingual contexts, remains a relatively neglected aspect of Large Language Model (LLM) development. Existing tokenizers often exhibit high…

Computation and Language · Computer Science 2025-08-12 Aamod Thakur , Ajay Nagpal , Atharva Savarkar , Kundeshwar Pundalik , Siddhesh Dosi , Piyush Sawarkar , Viraj Thakur , Rohit Saluja , Maunendra Sankar Desarkar , Ganesh Ramakrishnan

Reasoning-Driven Design of Single Atom Catalysts via a Multi-Agent Large Language Model Framework

Large language models (LLMs) are becoming increasingly applied beyond natural language processing, demonstrating strong capabilities in complex scientific tasks that traditionally require human expertise. This progress has extended into…

Materials Science · Physics 2026-02-26 Dong Hyeon Mok , Seoin Back , Victor Fung , Guoxiang Hu