Related papers: Multi-Agent Interactive Question Generation Framew…

MDocAgent: A Multi-Modal Multi-Agent Framework for Document Understanding

Document Question Answering (DocQA) is a very common task. Existing methods using Large Language Models (LLMs) or Large Vision Language Models (LVLMs) and Retrieval Augmented Generation (RAG) often prioritize information from a single…

Machine Learning · Computer Science 2025-03-19 Siwei Han , Peng Xia , Ruiyi Zhang , Tong Sun , Yun Li , Hongtu Zhu , Huaxiu Yao

A-SEA3L-QA: A Fully Automated Self-Evolving, Adversarial Workflow for Arabic Long-Context Question-Answer Generation

We present an end-to-end, self-evolving adversarial workflow for long-context Question-Answer (QA) Generation in Arabic. By orchestrating multiple specialized LVLMs: a question generator, an evaluator, and a swarm of answer generators, our…

Computation and Language · Computer Science 2025-09-04 Kesen Wang , Daulet Toibazar , Pedro J. Moreno

Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering

Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in…

Computation and Language · Computer Science 2025-08-08 Zhongjian Hu , Peng Yang , Bing Li , Zhenqi Wang

Rethinking Information Synthesis in Multimodal Question Answering A Multi-Agent Perspective

Recent advances in multimodal question answering have primarily focused on combining heterogeneous modalities or fine-tuning multimodal large language models. While these approaches have shown strong performance, they often rely on a…

Computation and Language · Computer Science 2026-04-22 Krishna Singh Rajput , Tejas Anvekar , Chitta Baral , Vivek Gupta

MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations

Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Yubo Ma , Yuhang Zang , Liangyu Chen , Meiqi Chen , Yizhu Jiao , Xinze Li , Xinyuan Lu , Ziyu Liu , Yan Ma , Xiaoyi Dong , Pan Zhang , Liangming Pan , Yu-Gang Jiang , Jiaqi Wang , Yixin Cao , Aixin Sun

DocLens : A Tool-Augmented Multi-Agent Framework for Long Visual Document Understanding

Comprehending long visual documents, where information is distributed across extensive pages of text and visual elements, is a critical but challenging task for modern Vision-Language Models (VLMs). Existing approaches falter on a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Dawei Zhu , Rui Meng , Jiefeng Chen , Sujian Li , Tomas Pfister , Jinsung Yoon

Capturing Greater Context for Question Generation

Automatic question generation can benefit many applications ranging from dialogue systems to reading comprehension. While questions are often asked with respect to long documents, there are many challenges with modeling such long documents.…

Computation and Language · Computer Science 2019-10-24 Luu Anh Tuan , Darsh J Shah , Regina Barzilay

A Survey of Large Language Model Agents for Question Answering

This paper surveys the development of large language model (LLM)-based agents for question answering (QA). Traditional agents face significant limitations, including substantial data requirements and difficulty in generalizing to new…

Computation and Language · Computer Science 2025-03-26 Murong Yue

Resolving Evidence Sparsity: Agentic Context Engineering for Long-Document Understanding

Document understanding is a long standing practical task. Vision Language Models (VLMs) have gradually become a primary approach in this domain, demonstrating effective performance on single page tasks. However, their effectiveness…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Keliang Liu , Zizhi Chen , Mingcheng Li , Jingqun Tang , Dingkang Yang , Lihua Zhang

Towards Top-Down Reasoning: An Explainable Multi-Agent Approach for Visual Question Answering

Recently, to comprehensively improve Vision Language Models (VLMs) for Visual Question Answering (VQA), several methods have been proposed to further reinforce the inference capabilities of VLMs to independently tackle VQA tasks rather than…

Computer Vision and Pattern Recognition · Computer Science 2025-02-17 Zeqing Wang , Wentao Wan , Qiqing Lao , Runmeng Chen , Minjie Lang , Xiao Wang , Keze Wang , Liang Lin

Dynamic Multi-Agent Orchestration and Retrieval for Multi-Source Question-Answer Systems using Large Language Models

We propose a methodology that combines several advanced techniques in Large Language Model (LLM) retrieval to support the development of robust, multi-source question-answer systems. This methodology is designed to integrate information…

Artificial Intelligence · Computer Science 2024-12-25 Antony Seabra , Claudio Cavalcante , Joao Nepomuceno , Lucas Lago , Nicolaas Ruberg , Sergio Lifschitz

An Interactive Query Generation Assistant using LLM-based Prompt Modification and User Feedback

While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other…

Artificial Intelligence · Computer Science 2023-11-21 Kaustubh D. Dhole , Ramraj Chandradevan , Eugene Agichtein

FactGuard: Leveraging Multi-Agent Systems to Generate Answerable and Unanswerable Questions for Enhanced Long-Context LLM Extraction

Extractive reading comprehension systems are designed to locate the correct answer to a question within a given text. However, a persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably…

Computation and Language · Computer Science 2025-04-09 Qian-Wen Zhang , Fang Li , Jie Wang , Lingfeng Qiao , Yifei Yu , Di Yin , Xing Sun

Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning

We present a Collaborative Agent-Based Framework for Multi-Image Reasoning. Our approach tackles the challenge of interleaved multimodal reasoning across diverse datasets and task formats by employing a dual-agent system: a language-based…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Angelos Vlachos , Giorgos Filandrianos , Maria Lymperaiou , Nikolaos Spanos , Ilias Mitsouras , Vasileios Karampinis , Athanasios Voulodimos

Adaptive Video Understanding Agent: Enhancing efficiency with dynamic frame sampling and feedback-driven reasoning

Understanding long-form video content presents significant challenges due to its temporal complexity and the substantial computational resources required. In this work, we propose an agent-based approach to enhance both the efficiency and…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Sullam Jeoung , Goeric Huybrechts , Bhavana Ganesh , Aram Galstyan , Sravan Bodapati

DocAgent: A Multi-Agent System for Automated Code Documentation Generation

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete,…

Software Engineering · Computer Science 2025-05-27 Dayu Yang , Antoine Simoulin , Xin Qian , Xiaoyi Liu , Yuwei Cao , Zhaopu Teng , Grey Yang

Long-Span Question-Answering: Automatic Question Generation and QA-System Ranking via Side-by-Side Evaluation

We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing, but the emergence of…

Computation and Language · Computer Science 2024-06-04 Bernd Bohnet , Kevin Swersky , Rosanne Liu , Pranjal Awasthi , Azade Nova , Javier Snaider , Hanie Sedghi , Aaron T Parisi , Michael Collins , Angeliki Lazaridou , Orhan Firat , Noah Fiedel

Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications

Large Language Models (LLMs) have demonstrated impressive performance across diverse domains, yet they still encounter challenges such as insufficient domain-specific knowledge, biases, and hallucinations. This underscores the need for…

Computation and Language · Computer Science 2025-04-07 Hongliu Cao , Ilias Driouich , Robin Singh , Eoin Thomas

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Existing MLLMs encounter significant challenges in modeling the temporal context within long videos. Currently, mainstream Agent-based methods use external tools to assist a single MLLM in answering long video questions. Despite such…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Boyu Chen , Zhengrong Yue , Siran Chen , Zikang Wang , Yang Liu , Peng Li , Yali Wang

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up.…

Computation and Language · Computer Science 2024-10-04 Minzheng Wang , Longze Chen , Cheng Fu , Shengyi Liao , Xinghua Zhang , Bingli Wu , Haiyang Yu , Nan Xu , Lei Zhang , Run Luo , Yunshui Li , Min Yang , Fei Huang , Yongbin Li