English
Related papers

Related papers: CoMeT: Collaborative Memory Transformer for Effici…

200 papers

Large Language Models (LLMs) struggle to handle long input sequences due to high memory and runtime costs. Memory-augmented models have emerged as a promising solution to this problem, but current methods are hindered by limited memory…

Computation and Language · Computer Science 2024-02-22 Zexue He , Leonid Karlinsky , Donghyun Kim , Julian McAuley , Dmitry Krotov , Rogerio Feris

Incorporating knowledge bases (KB) into end-to-end task-oriented dialogue systems is challenging, since it requires to properly represent the entity of KB, which is associated with its KB context and dialogue context. The existing works…

Computation and Language · Computer Science 2021-09-30 Yanjie Gou , Yinjie Lei , Lingqiao Liu , Yong Dai , Chunxu Shen

Long context inference scenarios have become increasingly important for large language models, yet they introduce significant computational latency. While prior research has optimized long-sequence inference through operators, model…

Computation and Language · Computer Science 2025-11-10 Wei Shao , Lingchao Zheng , Pengyu Wang , Peizhen Zheng , Jun Li , Yuwei Fan

Transformer-based large language models (LLM) have been widely used in language processing applications. However, due to the memory constraints of the devices, most of them restrict the context window. Even though recurrent models in…

Computation and Language · Computer Science 2025-02-07 Zifan He , Yingqi Cao , Zongyue Qin , Neha Prakriya , Yizhou Sun , Jason Cong

Recurrent LLM architectures have emerged as a promising approach for improving reasoning, as they enable multi-step computation in the embedding space without generating intermediate tokens. Models such as Ouro perform reasoning by…

The quadratic complexity of the attention module makes it gradually become the bulk of compute in Transformer-based LLMs during generation. Moreover, the excessive key-value cache that arises when dealing with long inputs also brings severe…

Computation and Language · Computer Science 2023-10-17 Siyu Ren , Qi Jia , Kenny Q. Zhu

Large Language Models (LLMs) face significant computational challenges when processing long contexts due to the quadratic complexity of self-attention. While soft context compression methods, which map input text to smaller latent…

Computation and Language · Computer Science 2025-09-24 Gabriele Berton , Jayakrishnan Unnikrishnan , Son Tran , Mubarak Shah

Transformer-based large language models (LLMs) encounter challenges in processing long sequences on edge devices due to the quadratic complexity of attention mechanisms and growing memory demands from Key-Value (KV) cache. Existing KV cache…

Computation and Language · Computer Science 2025-03-31 Jiyu Chen , Shuang Peng , Daxiong Luo , Fan Yang , Renshou Wu , Fangyuan Li , Xiaoxin Chen

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models…

Computation and Language · Computer Science 2023-06-13 Weizhi Wang , Li Dong , Hao Cheng , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei

Large Language Models (LLMs) still suffer from severe hallucinations and catastrophic forgetting during causal reasoning over massive, fragmented long contexts. Existing memory mechanisms typically treat retrieval as a static, single-step…

Multiagent Systems · Computer Science 2026-05-19 Haodong Lei , Junming Liu , Yirong Chen , Ding Wang , Hongsong Wang

Efficient long-context LLM deployment is stalled by a dichotomy between amortized compression, which struggles with out-of-distribution generalization, and Test-Time Training, which incurs prohibitive synthetic data costs and requires…

Machine Learning · Computer Science 2026-02-26 Zeju Li , Yizhou Zhou , Qiang Xu

Handling long input contexts remains a significant challenge for Large Language Models (LLMs), particularly in resource-constrained environments such as mobile devices. Our work aims to address this limitation by introducing InfiniPot, a…

Computation and Language · Computer Science 2024-10-04 Minsoo Kim , Kyuhong Shim , Jungwook Choi , Simyung Chang

Transformer-based Large Language Models (LLMs) have exhibited remarkable success in extensive tasks primarily attributed to self-attention mechanism, which requires a token to consider all preceding tokens as its context to compute…

Computation and Language · Computer Science 2025-08-05 Yaofo Chen , Zeng You , Shuhai Zhang , Haokun Li , Yirui Li , Yaowei Wang , Mingkui Tan

Transformer-based Language Models' computation and memory overhead increase quadratically as a function of sequence length. The quadratic cost poses challenges when employing LLMs for processing long sequences. In this work, we introduce…

Computation and Language · Computer Science 2025-10-23 Kiarash Zahirnia , Zahra Golpayegani , Walid Ahmed , Yang Liu

Large language models (LLMs) face significant challenges in processing long contexts due to the linear growth of the key-value (KV) cache and quadratic complexity of self-attention. Existing approaches address these bottlenecks separately:…

Computation and Language · Computer Science 2026-04-17 Zeng You , Yaofo Chen , Qiuwu Chen , Ying Sun , Shuhai Zhang , Yingjian Li , Yaowei Wang , Mingkui Tan

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in…

Computation and Language · Computer Science 2020-10-20 Ricardo Rei , Craig Stewart , Ana C Farinha , Alon Lavie

Extending large language models (LLMs) to process longer inputs is crucial for a wide range of applications. However, the substantial computational cost of transformers and limited generalization of positional encoding restrict the size of…

Computation and Language · Computer Science 2025-06-11 Howard Yen , Tianyu Gao , Danqi Chen

Context-aware Machine Translation aims to improve translations of sentences by incorporating surrounding sentences as context. Towards this task, two main architectures have been applied, namely single-encoder (based on concatenation) and…

Computation and Language · Computer Science 2024-02-05 Paweł Mąka , Yusuf Can Semerci , Jan Scholtes , Gerasimos Spanakis

To support long-term interaction in complex environments, LLM agents require memory systems that manage historical experiences. Existing approaches either retain full interaction histories via passive context extension, leading to…

Artificial Intelligence · Computer Science 2026-01-30 Jiaqi Liu , Yaofeng Su , Peng Xia , Siwei Han , Zeyu Zheng , Cihang Xie , Mingyu Ding , Huaxiu Yao

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has…

Computation and Language · Computer Science 2022-12-09 Aydar Bulatov , Yuri Kuratov , Mikhail S. Burtsev
‹ Prev 1 2 3 10 Next ›