English
Related papers

Related papers: Latency Adjustable Transformer Encoder for Languag…

200 papers

Augmenting large language models (LLMs) with auxiliary tokens has emerged as a promising strategy for enhancing model performance. In this work, we introduce a lightweight method termed latent tokens; these are dummy tokens that may be…

Machine Learning · Computer Science 2025-05-20 Yuchang Sun , Yanxi Chen , Yaliang Li , Bolin Ding

Large Language Models (LLMs) based on autoregressive, decoder-only Transformers generate text one token at a time, where a token represents a discrete unit of text. As each newly produced token is appended to the partial output sequence,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-06 Dimitrios Kafetzis , Ramin Khalili , Iordanis Koutsopoulos

Finetuning language models (LMs) is crucial for adapting the models to downstream data and tasks. However, full finetuning is usually costly. Existing work, such as parameter-efficient finetuning (PEFT), often focuses on \textit{how to…

Computation and Language · Computer Science 2025-06-03 Jian Gu , Aldeida Aleti , Chunyang Chen , Hongyu Zhang

Large language models (LLMs) power many state-of-the-art systems in natural language processing. However, these models are extremely computationally expensive, even at inference time, raising the natural question: when is the extra cost of…

Machine Learning · Computer Science 2023-05-05 Deepak Narayanan , Keshav Santhanam , Peter Henderson , Rishi Bommasani , Tony Lee , Percy Liang

Transformer encoders contextualize token representations by attending to all other tokens at each layer, leading to quadratic increase in compute effort with the input length. In practice, however, the input text of many NLP tasks can be…

Computation and Language · Computer Science 2023-06-01 Jeremiah Milbauer , Annie Louis , Mohammad Javad Hosseini , Alex Fabrikant , Donald Metzler , Tal Schuster

Large Language Models are growing in size, and we expect them to continue to do so, as larger models train quicker. However, this increase in size will severely impact inference costs. Therefore model compression is important, to retain the…

Machine Learning · Computer Science 2024-04-10 Georgy Tyukin

Transformer-based language models utilize the attention mechanism for substantial performance improvements in almost all natural language processing (NLP) tasks. Similar attention structures are also extensively studied in several other…

Computation and Language · Computer Science 2023-05-17 Nurullah Sevim , Ege Ozan Özyedek , Furkan Şahinuç , Aykut Koç

Transformer architectures are the backbone of the modern AI revolution. However, they are based on simply stacking the same blocks in dozens of layers and processing information sequentially from one block to another. In this paper, we…

Computation and Language · Computer Science 2024-12-24 Prateek Verma , Mert Pilanci

Powerful foundation models, including large language models (LLMs), with Transformer architectures have ushered in a new era of Generative AI across various industries. Industry and research community have witnessed a large number of new…

This paper introduces an efficient strategy to transform Large Language Models (LLMs) into Multi-Modal Large Language Models (MLLMs). By conceptualizing this transformation as a domain adaptation process, i.e., transitioning from text…

Computation and Language · Computer Science 2023-12-19 Bingchen Zhao , Haoqin Tu , Chen Wei , Jieru Mei , Cihang Xie

Fine-tuning and inference with large Language Models (LM) are generally known to be expensive. Parameter-efficient fine-tuning over pretrained LMs reduces training memory by updating a small number of LM parameters but does not improve…

Computation and Language · Computer Science 2024-06-05 Bowen Zhao , Hannaneh Hajishirzi , Qingqing Cao

Large Language Models (LLMs) have pushed the frontier of artificial intelligence but are comprised of hundreds of billions of parameters and operations. For faster inference latency, LLMs are deployed on multiple hardware accelerators…

Machine Learning · Computer Science 2026-01-07 Jan Hansen-Palmus , Michael Truong Le , Oliver Hausdörfer , Alok Verma

Transformer models have revolutionized natural language processing, achieving state-of-the-art performance and demonstrating remarkable scalability. However, their memory demands, particularly due to maintaining full context in memory, pose…

Computation and Language · Computer Science 2025-11-04 Juan Gabriel Kostelec , Qinghai Guo

Large language models have transformed natural language processing, yet supervised fine-tuning (SFT) remains computationally intensive. This paper formally proves that capabilities acquired through SFT can be approximated by a base…

Machine Learning · Computer Science 2025-06-11 Asankhaya Sharma

Lattices are compact representations that encode multiple hypotheses, such as speech recognition results or different word segmentations. It is shown that encoding lattices as opposed to 1-best results generated by automatic speech…

Computation and Language · Computer Science 2020-11-03 Chao-Wei Huang , Yun-Nung Chen

We develop a novel approach for confidently accelerating inference in the large and expensive multilayer Transformers that are now ubiquitous in natural language processing (NLP). Amortized or approximate computational methods increase…

Computation and Language · Computer Science 2021-09-10 Tal Schuster , Adam Fisch , Tommi Jaakkola , Regina Barzilay

While transformer models have been highly successful, they are computationally inefficient. We observe that for each layer, the full width of the layer may be needed only for a small subset of tokens inside a batch and that the "effective"…

Machine Learning · Computer Science 2024-12-19 Bartosz Wójcik , Alessio Devoto , Karol Pustelnik , Pasquale Minervini , Simone Scardapane

Sequence transducers, such as the RNN-T and the Conformer-T, are one of the most promising models of end-to-end speech recognition, especially in streaming scenarios where both latency and accuracy are important. Although various methods,…

Audio and Speech Processing · Electrical Eng. & Systems 2022-11-07 Yusuke Shinohara , Shinji Watanabe

Powerful generative artificial intelligence from large language models (LLMs) harnesses extensive computational resources for inference. In this work, we investigate the transformer architecture, a key component of these models, under the…

Scaling language models to handle longer contexts introduces substantial memory challenges due to the growing cost of key-value (KV) caches. Motivated by the efficiency gains of hybrid models and the broad availability of pretrained large…

Computation and Language · Computer Science 2026-05-19 Xuan Zhang , Fengzhuo Zhang , Cunxiao Du , Chao Du , Tianyu Pang , Wei Gao , Min Lin
‹ Prev 1 2 3 10 Next ›