Related papers: FastRM: An efficient and automatic explainability …

Efficient Inference for Large Reasoning Models: A Survey

Large Reasoning Models (LRMs) significantly improve the reasoning ability of Large Language Models (LLMs) by learning to reason, exhibiting promising performance in solving complex tasks. However, their deliberative reasoning process leads…

Computation and Language · Computer Science 2025-08-14 Yue Liu , Jiaying Wu , Yufei He , Ruihan Gong , Jun Xia , Liang Li , Hongcheng Gao , Hongyu Chen , Baolong Bi , Jiaheng Zhang , Zhiqi Huang , Bryan Hooi , Stan Z. Li , Keqin Li

Towards Efficient Large Vision-Language Models: A Comprehensive Survey on Inference Strategies

Although Large Vision Language Models (LVLMs) have demonstrated impressive multimodal reasoning capabilities, their scalability and deployment are constrained by massive computational requirements. In particular, the massive amount of…

Machine Learning · Computer Science 2026-04-14 Surendra Pathak , Bo Han

Improving Retrieval Augmented Language Model with Self-Reasoning

The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models…

Computation and Language · Computer Science 2024-12-20 Yuan Xia , Jingbo Zhou , Zhenhui Shi , Jun Chen , Haifeng Huang

FastMem: Fast Memorization of Prompt Improves Context Awareness of Large Language Models

Large language models (LLMs) excel in generating coherent text, but they often struggle with context awareness, leading to inaccuracies in tasks requiring faithful adherence to provided information. We introduce FastMem, a novel method…

Computation and Language · Computer Science 2024-10-08 Junyi Zhu , Shuochen Liu , Yu Yu , Bo Tang , Yibo Yan , Zhiyu Li , Feiyu Xiong , Tong Xu , Matthew B. Blaschko

mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation

Large Vision-Language Models (LVLMs) have made remarkable strides in multimodal tasks such as visual question answering, visual grounding, and complex reasoning. However, they remain limited by static training data, susceptibility to…

Artificial Intelligence · Computer Science 2025-08-27 Chan-Wei Hu , Yueqi Wang , Shuo Xing , Chia-Ju Chen , Suofei Feng , Ryan Rossi , Zhengzhong Tu

REALM: Recursive Relevance Modeling for LLM-based Document Re-Ranking

Large Language Models (LLMs) have shown strong capabilities in document re-ranking, a key component in modern Information Retrieval (IR) systems. However, existing LLM-based approaches face notable limitations, including ranking…

Information Retrieval · Computer Science 2025-10-03 Pinhuan Wang , Zhiqiu Xia , Chunhua Liao , Feiyi Wang , Hang Liu

Structured Relevance Assessment for Robust Retrieval-Augmented Language Models

Retrieval-Augmented Language Models (RALMs) face significant challenges in reducing factual errors, particularly in document relevance evaluation and knowledge integration. We introduce a framework for structured relevance assessment that…

Artificial Intelligence · Computer Science 2025-07-30 Aryan Raj , Astitva Veer Garg , Anitha D

Exploring and Exploiting the Inherent Efficiency within Large Reasoning Models for Self-Guided Efficiency Enhancement

Recent advancements in large reasoning models (LRMs) have significantly enhanced language models' capabilities in complex problem-solving by emulating human-like deliberative thinking. However, these models often exhibit overthinking (i.e.,…

Artificial Intelligence · Computer Science 2025-06-19 Weixiang Zhao , Jiahe Guo , Yang Deng , Xingyu Sui , Yulin Hu , Yanyan Zhao , Wanxiang Che , Bing Qin , Tat-Seng Chua , Ting Liu

Robust Diagram Reasoning: A Framework for Enhancing LVLM Performance on Visually Perturbed Scientific Diagrams

Large Language Models (LLMs) and their multimodal variants (LVLMs) hold immense promise for scientific and engineering applications, particularly in processing visual information like scientific diagrams. However, their practical deployment…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Minghao Zhou , Rafael Souza , Yaqian Hu , Luming Che

GRM: Generative Relevance Modeling Using Relevance-Aware Sample Estimation for Document Retrieval

Recent studies show that Generative Relevance Feedback (GRF), using text generated by Large Language Models (LLMs), can enhance the effectiveness of query expansion. However, LLMs can generate irrelevant information that harms retrieval…

Information Retrieval · Computer Science 2023-06-19 Iain Mackie , Ivan Sekulic , Shubham Chatterjee , Jeffrey Dalton , Fabio Crestani

Structured Labeling Enables Faster Vision-Language Models for End-to-End Autonomous Driving

Vision-Language Models (VLMs) offer a promising approach to end-to-end autonomous driving due to their human-like reasoning capabilities. However, troublesome gaps remains between current VLMs and real-world autonomous driving applications.…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Hao Jiang , Chuan Hu , Yukang Shi , Yuan He , Ke Wang , Xi Zhang , Zhipeng Zhang

Implicit Graph, Explicit Retrieval: Towards Efficient and Interpretable Long-horizon Memory for Large Language Models

Long-horizon applications increasingly require large language models (LLMs) to answer queries when relevant evidence is sparse and dispersed across very long contexts. Existing memory systems largely follow two paradigms: explicit…

Computation and Language · Computer Science 2026-01-08 Xin Zhang , Kailai Yang , Hao Li , Chenyue Li , Qiyu Wei , Sophia Ananiadou

Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation

Large Language Models (LLMs) have demonstrated significant performance improvements across various cognitive tasks. An emerging application is using LLMs to enhance retrieval-augmented generation (RAG) capabilities. These systems require…

Computation and Language · Computer Science 2025-01-28 Satyapriya Krishna , Kalpesh Krishna , Anhad Mohananey , Steven Schwarcz , Adam Stambler , Shyam Upadhyay , Manaal Faruqui

ReasonDrive: Efficient Visual Question Answering for Autonomous Vehicles with Reasoning-Enhanced Small Vision-Language Models

Vision-language models (VLMs) show promise for autonomous driving but often lack transparent reasoning capabilities that are critical for safety. We investigate whether explicitly modeling reasoning during fine-tuning enhances VLM…

Computer Vision and Pattern Recognition · Computer Science 2025-04-16 Amirhosein Chahe , Lifeng Zhou

Fast Distributed Inference Serving for Large Language Models

Large language models (LLMs) power a new generation of interactive AI applications exemplified by ChatGPT. The interactive nature of these applications demands low latency for LLM inference. Existing LLM serving systems use…

Machine Learning · Computer Science 2024-09-26 Bingyang Wu , Yinmin Zhong , Zili Zhang , Shengyu Liu , Fangyue Liu , Yuanhang Sun , Gang Huang , Xuanzhe Liu , Xin Jin

Coherency Improved Explainable Recommendation via Large Language Model

Explainable recommender systems are designed to elucidate the explanation behind each recommendation, enabling users to comprehend the underlying logic. Previous works perform rating prediction and explanation generation in a multi-task…

Information Retrieval · Computer Science 2025-04-09 Shijie Liu , Ruixing Ding , Weihai Lu , Jun Wang , Mo Yu , Xiaoming Shi , Wei Zhang

fastbmRAG: A Fast Graph-Based RAG Framework for Efficient Processing of Large-Scale Biomedical Literature

Large language models (LLMs) are rapidly transforming various domains, including biomedicine and healthcare, and demonstrate remarkable potential from scientific research to new drug discovery. Graph-based retrieval-augmented generation…

Quantitative Methods · Quantitative Biology 2025-11-14 Guofeng Meng , Li Shen , Qiuyan Zhong , Wei Wang , Haizhou Zhang , Xiaozhen Wang

An Image is Worth 1/2 Tokens After Layer 2: Plug-and-Play Inference Acceleration for Large Vision-Language Models

In this study, we identify the inefficient attention phenomena in Large Vision-Language Models (LVLMs), notably within prominent models like LLaVA-1.5, QwenVL-Chat and Video-LLaVA. We find out that the attention computation over visual…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Liang Chen , Haozhe Zhao , Tianyu Liu , Shuai Bai , Junyang Lin , Chang Zhou , Baobao Chang

ReaLM: Reliable and Efficient Large Language Model Inference with Statistical Algorithm-Based Fault Tolerance

The demand for efficient large language model (LLM) inference has propelled the development of dedicated accelerators. As accelerators are vulnerable to hardware faults due to aging, variation, etc, existing accelerator designs often…

Hardware Architecture · Computer Science 2025-04-08 Tong Xie , Jiawang Zhao , Zishen Wan , Zuodong Zhang , Yuan Wang , Runsheng Wang , Ru Huang , Meng Li

Explainable and Interpretable Multimodal Large Language Models: A Comprehensive Survey

The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with large language models (LLMs) and computer vision (CV) systems driving advancements in natural language understanding and visual processing,…

Computation and Language · Computer Science 2024-12-04 Yunkai Dang , Kaichen Huang , Jiahao Huo , Yibo Yan , Sirui Huang , Dongrui Liu , Mengxi Gao , Jie Zhang , Chen Qian , Kun Wang , Yong Liu , Jing Shao , Hui Xiong , Xuming Hu