Related papers: DocReLM: Mastering Document Retrieval with Languag…

DeepRetrieval: Hacking Real Search Engines and Retrievers with Large Language Models via Reinforcement Learning

Information retrieval systems are crucial for enabling effective access to large document collections. Recent approaches have leveraged Large Language Models (LLMs) to enhance retrieval performance through query augmentation, but often rely…

Information Retrieval · Computer Science 2025-04-15 Pengcheng Jiang , Jiacheng Lin , Lang Cao , Runchu Tian , SeongKu Kang , Zifeng Wang , Jimeng Sun , Jiawei Han

Scientific Paper Retrieval with LLM-Guided Semantic-Based Ranking

Scientific paper retrieval is essential for supporting literature discovery and research. While dense retrieval methods demonstrate effectiveness in general-purpose tasks, they often fail to capture fine-grained scientific concepts that are…

Information Retrieval · Computer Science 2025-10-07 Yunyi Zhang , Ruozhen Yang , Siqi Jiao , SeongKu Kang , Jiawei Han

REALM: Recursive Relevance Modeling for LLM-based Document Re-Ranking

Large Language Models (LLMs) have shown strong capabilities in document re-ranking, a key component in modern Information Retrieval (IR) systems. However, existing LLM-based approaches face notable limitations, including ranking…

Information Retrieval · Computer Science 2025-10-03 Pinhuan Wang , Zhiqiu Xia , Chunhua Liao , Feiyi Wang , Hang Liu

Evaluating the Effectiveness of Retrieval-Augmented Large Language Models in Scientific Document Reasoning

Despite the dramatic progress in Large Language Model (LLM) development, LLMs often provide seemingly plausible but not factual information, often referred to as hallucinations. Retrieval-augmented LLMs provide a non-parametric approach to…

Computation and Language · Computer Science 2023-11-09 Sai Munikoti , Anurag Acharya , Sridevi Wagle , Sameera Horawalavithana

DocRefine: An Intelligent Framework for Scientific Document Understanding and Content Optimization based on Multimodal Large Model Agents

The exponential growth of scientific literature in PDF format necessitates advanced tools for efficient and accurate document understanding, summarization, and content optimization. Traditional methods fall short in handling complex layouts…

Computer Vision and Pattern Recognition · Computer Science 2025-08-12 Kun Qian , Wenjie Li , Tianyu Sun , Wenhong Wang , Wenhan Luo

CorpusLM: Towards a Unified Language Model on Corpus for Knowledge-Intensive Tasks

Large language models (LLMs) have gained significant attention in various fields but prone to hallucination, especially in knowledge-intensive (KI) tasks. To address this, retrieval-augmented generation (RAG) has emerged as a popular…

Computation and Language · Computer Science 2024-04-23 Xiaoxi Li , Zhicheng Dou , Yujia Zhou , Fangchao Liu

MIRACL-VISION: A Large, multilingual, visual document retrieval benchmark

Document retrieval is an important task for search and Retrieval-Augmented Generation (RAG) applications. Large Language Models (LLMs) have contributed to improving the accuracy of text-based document retrieval. However, documents with…

Information Retrieval · Computer Science 2025-05-22 Radek Osmulski , Gabriel de Souza P. Moreira , Ronay Ak , Mengyao Xu , Benedikt Schifferer , Even Oldridge

DOCBENCH: A Benchmark for Evaluating LLM-based Document Reading Systems

Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going…

Computation and Language · Computer Science 2024-07-16 Anni Zou , Wenhao Yu , Hongming Zhang , Kaixin Ma , Deng Cai , Zhuosheng Zhang , Hai Zhao , Dong Yu

Exploiting LLMs' Reasoning Capability to Infer Implicit Concepts in Legal Information Retrieval

Statutory law retrieval is a typical problem in legal language processing, that has various practical applications in law engineering. Modern deep learning-based retrieval methods have achieved significant results for this problem. However,…

Computation and Language · Computer Science 2024-10-17 Hai-Long Nguyen , Tan-Minh Nguyen , Duc-Minh Nguyen , Thi-Hai-Yen Vuong , Ha-Thanh Nguyen , Xuan-Hieu Phan

Improving Retrieval Augmented Language Model with Self-Reasoning

The Retrieval-Augmented Language Model (RALM) has shown remarkable performance on knowledge-intensive tasks by incorporating external knowledge during inference, which mitigates the factual hallucinations inherited in large language models…

Computation and Language · Computer Science 2024-12-20 Yuan Xia , Jingbo Zhou , Zhenhui Shi , Jun Chen , Haifeng Huang

Cognitive-Aligned Document Selection for Retrieval-augmented Generation

Large language models (LLMs) inherently display hallucinations since the precision of generated texts cannot be guaranteed purely by the parametric knowledge they include. Although retrieval-augmented generation (RAG) systems enhance the…

Artificial Intelligence · Computer Science 2025-02-18 Bingyu Wan , Fuxi Zhang , Zhongpeng Qi , Jiayi Ding , Jijun Li , Baoshi Fan , Yijia Zhang , Jun Zhang

Using Large Language Models to Enrich the Documentation of Datasets for Machine Learning

Recent regulatory initiatives like the European AI Act and relevant voices in the Machine Learning (ML) community stress the need to describe datasets along several key dimensions for trustworthy AI, such as the provenance processes and…

Digital Libraries · Computer Science 2024-05-27 Joan Giner-Miguelez , Abel Gómez , Jordi Cabot

OpenScholar: Synthesizing Scientific Literature with Retrieval-augmented LMs

Scientific progress depends on researchers' ability to synthesize the growing body of literature. Can large language models (LMs) assist scientists in this task? We introduce OpenScholar, a specialized retrieval-augmented LM that answers…

Computation and Language · Computer Science 2024-11-22 Akari Asai , Jacqueline He , Rulin Shao , Weijia Shi , Amanpreet Singh , Joseph Chee Chang , Kyle Lo , Luca Soldaini , Sergey Feldman , Mike D'arcy , David Wadden , Matt Latzke , Minyang Tian , Pan Ji , Shengyan Liu , Hao Tong , Bohao Wu , Yanyu Xiong , Luke Zettlemoyer , Graham Neubig , Dan Weld , Doug Downey , Wen-tau Yih , Pang Wei Koh , Hannaneh Hajishirzi

Benchmarking Information Retrieval Models on Complex Retrieval Tasks

Large language models (LLMs) are incredible and versatile tools for text-based tasks that have enabled countless, previously unimaginable, applications. Retrieval models, in contrast, have not yet seen such capable general-purpose models…

Information Retrieval · Computer Science 2025-09-10 Julian Killingback , Hamed Zamani

Document Retrieval for Large Scale Content Analysis using Contextualized Dictionaries

This paper presents a procedure to retrieve subsets of relevant documents from large text collections for Content Analysis, e.g. in social sciences. Document retrieval for this purpose needs to take account of the fact that analysts often…

Information Retrieval · Computer Science 2017-07-12 Gregor Wiedemann , Andreas Niekler

On the Effectiveness of Large Language Models in Automating Categorization of Scientific Texts

The rapid advancement of Large Language Models (LLMs) has led to a multitude of application opportunities. One traditional task for Information Retrieval systems is the summarization and classification of texts, both of which are important…

Computation and Language · Computer Science 2025-02-25 Gautam Kishore Shahi , Oliver Hummel

LLM4Ranking: An Easy-to-use Framework of Utilizing Large Language Models for Document Reranking

Utilizing large language models (LLMs) for document reranking has been a popular and promising research direction in recent years, many studies are dedicated to improving the performance and efficiency of using LLMs for reranking. Besides,…

Information Retrieval · Computer Science 2025-04-11 Qi Liu , Haozhe Duan , Yiqun Chen , Quanfeng Lu , Weiwei Sun , Jiaxin Mao

REALM: Retrieval-Augmented Language Model Pre-Training

Language model pre-training has been shown to capture a surprising amount of world knowledge, crucial for NLP tasks such as question answering. However, this knowledge is stored implicitly in the parameters of a neural network, requiring…

Computation and Language · Computer Science 2020-02-21 Kelvin Guu , Kenton Lee , Zora Tung , Panupong Pasupat , Ming-Wei Chang

Large Language Model-guided Document Selection

Large Language Model (LLM) pre-training exhausts an ever growing compute budget, yet recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs. Inspired by efforts…

Computation and Language · Computer Science 2024-06-10 Xiang Kong , Tom Gunter , Ruoming Pang

Text Retrieval with Multi-Stage Re-Ranking Models

The text retrieval is the task of retrieving similar documents to a search query, and it is important to improve retrieval accuracy while maintaining a certain level of retrieval speed. Existing studies have reported accuracy improvements…

Information Retrieval · Computer Science 2023-11-15 Yuichi Sasazawa , Kenichi Yokote , Osamu Imaichi , Yasuhiro Sogawa