Related papers: Feedback-Driven Execution for LLM-Based Binary Ana…

FORGE: Fine-grained Multimodal Evaluation for Manufacturing Scenarios

The manufacturing sector is increasingly adopting Multimodal Large Language Models (MLLMs) to transition from simple perception to autonomous execution, yet current evaluations fail to reflect the rigorous demands of real-world…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Xiangru Jian , Hao Xu , Wei Pang , Xinjian Zhao , Chengyu Tao , Qixin Zhang , Xikun Zhang , Chao Zhang , Guanzhi Deng , Alex Xue , Juan Du , Tianshu Yu , Garth Tarr , Linqi Song , Qiuzhuang Sun , Dacheng Tao

LLMs as Firmware Experts: A Runtime-Grown Tree-of-Agents Framework

Large Language Models (LLMs) and their agent systems have recently demonstrated strong potential in automating code reasoning and vulnerability detection. However, when applied to large-scale firmware, their performance degrades due to the…

Cryptography and Security · Computer Science 2025-11-25 Xiangrui Zhang , Zeyu Chen , Haining Wang , Qiang Li

AgentForge: Execution-Grounded Multi-Agent LLM Framework for Autonomous Software Engineering

Large language models generate plausible code but cannot verify correctness. Existing multi-agent systems simulate execution or leave verification optional. We introduce execution-grounded verification as a first-class principle: every code…

Software Engineering · Computer Science 2026-04-16 Rajesh Kumar , Waqar Ali , Junaid Ahmed , Najma Imtiaz Ali , Shaban Usman

FORGE: An LLM-driven Framework for Large-Scale Smart Contract Vulnerability Dataset Construction

High-quality smart contract vulnerability datasets are critical for evaluating security tools and advancing smart contract security research. Two major limitations of current manual dataset construction are (1) labor-intensive and…

Cryptography and Security · Computer Science 2025-06-24 Jiachi Chen , Yiming Shen , Jiashuo Zhang , Zihao Li , John Grundy , Zhenzhe Shao , Yanlin Wang , Jiashui Wang , Ting Chen , Zibin Zheng

Enhancing LLMs for Power System Simulations: A Feedback-driven Multi-agent Framework

The integration of experimental technologies with large language models (LLMs) is transforming scientific research. It positions AI as a versatile research assistant rather than a mere problem-solving tool. In the field of power systems,…

Computation and Language · Computer Science 2025-05-20 Mengshuo Jia , Zeyu Cui , Gabriela Hug

RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum. Crucially, such LLMs need to ground their generations in any feedback obtained to…

Computation and Language · Computer Science 2025-02-19 Jonas Gehring , Kunhao Zheng , Jade Copet , Vegard Mella , Quentin Carbonneaux , Taco Cohen , Gabriel Synnaeve

FORGE: Self-Evolving Agent Memory With No Weight Updates via Population Broadcast

Can LLM agents improve decision-making through self-generated memory without gradient updates? We propose FORGE (Failure-Optimized Reflective Graduation and Evolution), a staged, population-based protocol that evolves prompt-injected…

Artificial Intelligence · Computer Science 2026-05-18 Igor Bogdanov , Chung-Horng Lung , Thomas Kunz , Jie Gao , Adrian Taylor , Marzia Zaman

LockForge: Automating Paper-to-Code for Logic Locking with Multi-Agent Reasoning LLMs

Despite rapid progress in logic locking (LL), reproducibility remains a challenge as codes are rarely made public. We present LockForge, a first-of-its-kind, multi-agent large language model (LLM) framework that turns LL descriptions in…

Cryptography and Security · Computer Science 2025-12-01 Akashdeep Saha , Zeng Wang , Prithwish Basu Roy , Johann Knechtel , Ozgur Sinanoglu , Ramesh Karri

An LLM-based multi-agent framework for agile effort estimation

Effort estimation is a crucial activity in agile software development, where teams collaboratively review, discuss, and estimate the effort required to complete user stories in a product backlog. Current practices in agile effort estimation…

Software Engineering · Computer Science 2025-09-19 Thanh-Long Bui , Hoa Khanh Dam , Rashina Hoda

Collaboration Dynamics and Reliability Challenges of Multi-Agent LLM Systems in Finite Element Analysis

Large Language Model (LLM)-based multi-agent systems are increasingly applied to automate computational workflows in science and engineering. However, how inter-agent dynamics influence reasoning quality and verification reliability remains…

Artificial Intelligence · Computer Science 2025-11-07 Chuan Tian , Yilei Zhang

An Empirical Evaluation of LLM-Based Approaches for Code Vulnerability Detection: RAG, SFT, and Dual-Agent Systems

The rapid advancement of Large Language Models (LLMs) presents new opportunities for automated software vulnerability detection, a crucial task in securing modern codebases. This paper presents a comparative study on the effectiveness of…

Software Engineering · Computer Science 2026-01-05 Md Hasan Saju , Maher Muhtadi , Akramul Azim

REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction

The ability to detect and analyze failed executions automatically is crucial for an explainable and robust robotic system. Recently, Large Language Models (LLMs) have demonstrated strong reasoning abilities on textual inputs. To leverage…

Robotics · Computer Science 2023-10-18 Zeyi Liu , Arpit Bahety , Shuran Song

Enhancing Relation Extraction via Supervised Rationale Verification and Feedback

Despite the rapid progress that existing automated feedback methods have made in correcting the output of large language models (LLMs), these methods cannot be well applied to the relation extraction (RE) task due to their designated…

Computation and Language · Computer Science 2024-12-12 Yongqi Li , Xin Miao , Shen Zhou , Mayi Xu , Yuyang Ren , Tieyun Qian

TestForge: Feedback-Driven, Agentic Test Suite Generation

Automated test generation holds great promise for alleviating the burdens of manual test creation. However, existing search-based techniques compromise on test readability, while LLM-based approaches are prohibitively expensive in practice.…

Software Engineering · Computer Science 2025-03-20 Kush Jain , Claire Le Goues

FORGE: Force-Guided Exploration for Robust Contact-Rich Manipulation under Uncertainty

We present FORGE, a method for sim-to-real transfer of force-aware manipulation policies in the presence of significant pose uncertainty. During simulation-based policy learning, FORGE combines a force threshold mechanism with a dynamics…

Robotics · Computer Science 2025-01-06 Michael Noseworthy , Bingjie Tang , Bowen Wen , Ankur Handa , Chad Kessens , Nicholas Roy , Dieter Fox , Fabio Ramos , Yashraj Narang , Iretiayo Akinola

LLM Agents for Bargaining with Utility-based Feedback

Bargaining, a critical aspect of real-world interactions, presents challenges for large language models (LLMs) due to limitations in strategic depth and adaptation to complex human factors. Existing benchmarks often fail to capture this…

Machine Learning · Computer Science 2025-07-15 Jihwan Oh

FALCON: Feedback-driven Adaptive Long/short-term memory reinforced Coding Optimization system

Recently, large language models (LLMs) have achieved significant progress in automated code generation. Despite their strong instruction-following capabilities, these models frequently struggled to align with user intent in coding…

Machine Learning · Computer Science 2025-06-23 Zeyuan Li , Yangfan He , Lewei He , Jianhui Wang , Tianyu Shi , Bin Lei , Yuchen Li , Qiuwu Chen

SQLForge: Synthesizing Reliable and Diverse Data to Enhance Text-to-SQL Reasoning in LLMs

Large Language models (LLMs) have demonstrated significant potential in text-to-SQL reasoning tasks, yet a substantial performance gap persists between existing open-source models and their closed-source counterparts. In this paper, we…

Computation and Language · Computer Science 2025-09-23 Yu Guo , Dong Jin , Shenghao Ye , Shuangwu Chen , Jian Yang , Xiaobin Tan

VERGE: Formal Refinement and Guidance Engine for Verifiable LLM Reasoning

Despite the syntactic fluency of Large Language Models (LLMs), ensuring their logical correctness in high-stakes domains remains a fundamental challenge. We present a neurosymbolic framework that combines LLMs with SMT solvers to produce…

Computation and Language · Computer Science 2026-05-05 Vikash Singh , Darion Cassel , Nathaniel Weir , Nick Feng , Sam Bayless

FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments

Large Language Models are being increasingly deployed as the decision-making core of autonomous agents capable of effecting change in external environments. Yet, in conversational benchmarks, which simulate real-world customer-centric issue…

Computation and Language · Computer Science 2026-04-29 Amir Saeidi , Venkatesh Mishra , Souradeep Mukhopadhyay , Gaowen Liu , Ali Payani , Jayanth Srinivasa , Chitta Baral