English
Related papers

Related papers: Multi-Agent Interactive Question Generation Framew…

200 papers

Document Question Answering (DocQA) is a very common task. Existing methods using Large Language Models (LLMs) or Large Vision Language Models (LVLMs) and Retrieval Augmented Generation (RAG) often prioritize information from a single…

Machine Learning · Computer Science 2025-03-19 Siwei Han , Peng Xia , Ruiyi Zhang , Tong Sun , Yun Li , Hongtu Zhu , Huaxiu Yao

We present an end-to-end, self-evolving adversarial workflow for long-context Question-Answer (QA) Generation in Arabic. By orchestrating multiple specialized LVLMs: a question generator, an evaluator, and a swarm of answer generators, our…

Computation and Language · Computer Science 2025-09-04 Kesen Wang , Daulet Toibazar , Pedro J. Moreno

Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in…

Computation and Language · Computer Science 2025-08-08 Zhongjian Hu , Peng Yang , Bing Li , Zhenqi Wang

Recent advances in multimodal question answering have primarily focused on combining heterogeneous modalities or fine-tuning multimodal large language models. While these approaches have shown strong performance, they often rely on a…

Computation and Language · Computer Science 2026-04-22 Krishna Singh Rajput , Tejas Anvekar , Chitta Baral , Vivek Gupta

Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable strides in various tasks, particularly in single-page document…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Yubo Ma , Yuhang Zang , Liangyu Chen , Meiqi Chen , Yizhu Jiao , Xinze Li , Xinyuan Lu , Ziyu Liu , Yan Ma , Xiaoyi Dong , Pan Zhang , Liangming Pan , Yu-Gang Jiang , Jiaqi Wang , Yixin Cao , Aixin Sun

Comprehending long visual documents, where information is distributed across extensive pages of text and visual elements, is a critical but challenging task for modern Vision-Language Models (VLMs). Existing approaches falter on a…

Computer Vision and Pattern Recognition · Computer Science 2025-11-17 Dawei Zhu , Rui Meng , Jiefeng Chen , Sujian Li , Tomas Pfister , Jinsung Yoon

Automatic question generation can benefit many applications ranging from dialogue systems to reading comprehension. While questions are often asked with respect to long documents, there are many challenges with modeling such long documents.…

Computation and Language · Computer Science 2019-10-24 Luu Anh Tuan , Darsh J Shah , Regina Barzilay

This paper surveys the development of large language model (LLM)-based agents for question answering (QA). Traditional agents face significant limitations, including substantial data requirements and difficulty in generalizing to new…

Computation and Language · Computer Science 2025-03-26 Murong Yue

Document understanding is a long standing practical task. Vision Language Models (VLMs) have gradually become a primary approach in this domain, demonstrating effective performance on single page tasks. However, their effectiveness…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Keliang Liu , Zizhi Chen , Mingcheng Li , Jingqun Tang , Dingkang Yang , Lihua Zhang

Recently, to comprehensively improve Vision Language Models (VLMs) for Visual Question Answering (VQA), several methods have been proposed to further reinforce the inference capabilities of VLMs to independently tackle VQA tasks rather than…

Computer Vision and Pattern Recognition · Computer Science 2025-02-17 Zeqing Wang , Wentao Wan , Qiqing Lao , Runmeng Chen , Minjie Lang , Xiao Wang , Keze Wang , Liang Lin

We propose a methodology that combines several advanced techniques in Large Language Model (LLM) retrieval to support the development of robust, multi-source question-answer systems. This methodology is designed to integrate information…

Artificial Intelligence · Computer Science 2024-12-25 Antony Seabra , Claudio Cavalcante , Joao Nepomuceno , Lucas Lago , Nicolaas Ruberg , Sergio Lifschitz

While search is the predominant method of accessing information, formulating effective queries remains a challenging task, especially for situations where the users are not familiar with a domain, or searching for documents in other…

Artificial Intelligence · Computer Science 2023-11-21 Kaustubh D. Dhole , Ramraj Chandradevan , Eugene Agichtein

Extractive reading comprehension systems are designed to locate the correct answer to a question within a given text. However, a persistent challenge lies in ensuring these models maintain high accuracy in answering questions while reliably…

Computation and Language · Computer Science 2025-04-09 Qian-Wen Zhang , Fang Li , Jie Wang , Lingfeng Qiao , Yifei Yu , Di Yin , Xing Sun

We present a Collaborative Agent-Based Framework for Multi-Image Reasoning. Our approach tackles the challenge of interleaved multimodal reasoning across diverse datasets and task formats by employing a dual-agent system: a language-based…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Angelos Vlachos , Giorgos Filandrianos , Maria Lymperaiou , Nikolaos Spanos , Ilias Mitsouras , Vasileios Karampinis , Athanasios Voulodimos

Understanding long-form video content presents significant challenges due to its temporal complexity and the substantial computational resources required. In this work, we propose an agent-based approach to enhance both the efficiency and…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Sullam Jeoung , Goeric Huybrechts , Bhavana Ganesh , Aram Galstyan , Sravan Bodapati

High-quality code documentation is crucial for software development especially in the era of AI. However, generating it automatically using Large Language Models (LLMs) remains challenging, as existing approaches often produce incomplete,…

Software Engineering · Computer Science 2025-05-27 Dayu Yang , Antoine Simoulin , Xin Qian , Xiaoyi Liu , Yuwei Cao , Zhaopu Teng , Grey Yang

We explore the use of long-context capabilities in large language models to create synthetic reading comprehension data from entire books. Previous efforts to construct such datasets relied on crowd-sourcing, but the emergence of…

Large Language Models (LLMs) have demonstrated impressive performance across diverse domains, yet they still encounter challenges such as insufficient domain-specific knowledge, biases, and hallucinations. This underscores the need for…

Computation and Language · Computer Science 2025-04-07 Hongliu Cao , Ilias Driouich , Robin Singh , Eoin Thomas

Existing MLLMs encounter significant challenges in modeling the temporal context within long videos. Currently, mainstream Agent-based methods use external tools to assist a single MLLM in answering long video questions. Despite such…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Boyu Chen , Zhengrong Yue , Siran Chen , Zikang Wang , Yang Liu , Peng Li , Yali Wang

Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up.…

Computation and Language · Computer Science 2024-10-04 Minzheng Wang , Longze Chen , Cheng Fu , Shengyi Liao , Xinghua Zhang , Bingli Wu , Haiyang Yu , Nan Xu , Lei Zhang , Run Luo , Yunshui Li , Min Yang , Fei Huang , Yongbin Li
‹ Prev 1 2 3 10 Next ›