English
Related papers

Related papers: BTR: Binary Token Representations for Efficient Re…

200 papers

Multimodal Large Language Models (MLLMs) have demonstrated exceptional success in various multimodal tasks, yet their deployment is frequently limited by substantial computational demands and prolonged inference times. Given that the vision…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Zihui Zhao , Yingxin Li , Yang Li

Existing pre-trained language models (PLMs) are often computationally expensive in inference, making them impractical in various resource-limited real-world applications. To address this issue, we propose a dynamic token reduction approach…

Computation and Language · Computer Science 2021-05-26 Deming Ye , Yankai Lin , Yufei Huang , Maosong Sun

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO)…

This paper introduces a simple and scalable approach to improve the data efficiency of large language model (LLM) training by augmenting existing text data with thinking trajectories. The compute for pre-training LLMs has been growing at an…

Computation and Language · Computer Science 2025-10-20 Liang Wang , Nan Yang , Shaohan Huang , Li Dong , Furu Wei

Since ChatGPT released its API for public use, the number of applications built on top of commercial large language models (LLMs) increase exponentially. One popular usage of such models is leveraging its in-context learning ability and…

Computation and Language · Computer Science 2023-10-26 Junyi Liu , Liangzhi Li , Tong Xiang , Bowen Wang , Yiming Qian

Retrieving and extracting knowledge from extensive research documents and large databases presents significant challenges for researchers, students, and professionals in today's information-rich era. Existing retrieval systems, which rely…

Information Retrieval · Computer Science 2025-02-06 Mohammed-Khalil Ghali , Abdelrahman Farrag , Daehan Won , Yu Jin

Large Language Models (LLMs) have swiftly emerged as vital resources for different applications in the biomedical and healthcare domains; however, these models encounter issues such as generating inaccurate information or hallucinations.…

Computation and Language · Computer Science 2024-05-06 Mingchen Li , Halil Kilicoglu , Hua Xu , Rui Zhang

Word embeddings are commonly used as a starting point in many NLP models to achieve state-of-the-art performances. However, with a large vocabulary and many dimensions, these floating-point representations are expensive both in terms of…

Computation and Language · Computer Science 2020-01-23 Julien Tissier , Christophe Gravier , Amaury Habrard

Pre-trained language models have shown stellar performance in various downstream tasks. But, this usually comes at the cost of high latency and computation, hindering their usage in resource-limited settings. In this work, we propose a…

Computation and Language · Computer Science 2022-03-18 Ali Modarressi , Hosein Mohebbi , Mohammad Taher Pilehvar

Retrieval augmentation is a powerful but expensive method to make language models more knowledgeable about the world. Memory-based methods like LUMEN pre-compute token representations for retrieved passages to drastically speed up…

Computation and Language · Computer Science 2023-08-30 Yury Zemlyanskiy , Michiel de Jong , Luke Vilnis , Santiago Ontañón , William W. Cohen , Sumit Sanghai , Joshua Ainslie

Transformers have a quadratic scaling of computational complexity with input size, which limits the input context window size of large language models (LLMs) in both training and inference. Meanwhile, retrieval-augmented generation (RAG)…

Computation and Language · Computer Science 2024-10-18 Yimin Tang , Yurong Xu , Ning Yan , Masood Mortazavi

A major limitation for the broader scope of problems solvable by transformers is the quadratic scaling of computational complexity with input size. In this study, we investigate the recurrent memory augmentation of pre-trained transformer…

Computation and Language · Computer Science 2024-02-07 Aydar Bulatov , Yuri Kuratov , Yermek Kapushev , Mikhail S. Burtsev

Augmenting a language model (LM) with $k$-nearest neighbors ($k$NN) retrieval on its training data alone can decrease its perplexity, though the underlying reasons for this remain elusive. In this work, we rule out one previously posited…

Computation and Language · Computer Science 2024-04-03 Ting-Rui Chiang , Xinyan Velocity Yu , Joshua Robinson , Ollie Liu , Isabelle Lee , Dani Yogatama

Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added post-hoc to an already-pretrained LM, which limits the…

Computation and Language · Computer Science 2024-07-23 Ohad Rubin , Jonathan Berant

Massive parameters of LLMs have made inference latency a fundamental bottleneck. Speculative decoding represents a lossless approach to accelerate inference through a guess-and-verify paradigm. Some methods rely on additional architectures…

Computation and Language · Computer Science 2025-05-27 Xianzhen Luo , Yixuan Wang , Qingfu Zhu , Zhiming Zhang , Xuanyu Zhang , Qing Yang , Dongliang Xu

To drive progress in science and engineering, large language models (LLMs) must be able to process large amounts of numerical data and solve long calculations efficiently. This is currently only possible through the use of external tools or…

Machine Learning · Computer Science 2026-05-21 Linus Kreitner , Paul Hager , Jonathan Mengedoht , Georgios Kaissis , Daniel Rueckert , Martin J. Menten

The high inference cost of Large Language Models (LLMs) poses challenges, especially for tasks requiring lengthy outputs. However, natural language often contains redundancy, which presents an opportunity for optimization. We have observed…

Computation and Language · Computer Science 2025-11-25 Alfredo Garrachón Ruiz , Tomás de la Rosa , Daniel Borrajo

Although reward models have been successful in improving multimodal large language models, the reward models themselves remain brutal and contain minimal information. Notably, existing reward models only mimic human annotations by assigning…

Machine Learning · Computer Science 2025-02-26 Deqing Fu , Tong Xiao , Rui Wang , Wang Zhu , Pengchuan Zhang , Guan Pang , Robin Jia , Lawrence Chen

Retrieval-augmented generation (RAG) is a promising way to improve large language models (LLMs) for generating more factual, accurate, and up-to-date content. Existing methods either optimize prompts to guide LLMs in leveraging retrieved…

Computation and Language · Computer Science 2024-12-12 Yutao Zhu , Zhaoheng Huang , Zhicheng Dou , Ji-Rong Wen

Despite their tremendous success and versatility, Deep Neural Networks (DNNs) such as Large Language Models (LLMs) suffer from inference inefficiency and rely on advanced computational infrastructure. To address these challenges and make…

Machine Learning · Computer Science 2025-05-05 Mohsen Dehghankar , Mahdi Erfanian , Abolfazl Asudeh
‹ Prev 1 2 3 10 Next ›