English
Related papers

Related papers: SPLAT: A framework for optimised GPU code-generati…

200 papers

Programming-based Pre-trained Language Models (PPLMs) such as CodeBERT have achieved great success in many downstream code-related tasks. Since the memory and computational complexity of self-attention in the Transformer grow quadratically…

Computation and Language · Computer Science 2022-05-30 Tingting Liu , Chengyu Wang , Cen Chen , Ming Gao , Aoying Zhou

The growing demand for long-context inference capabilities in Large Language Models (LLMs) has intensified the computational and memory bottlenecks inherent to the self-attention mechanism. To address this challenge, we introduce BLASST, a…

Sparse-Linear Attention (SLA) combines sparse and linear attention to accelerate diffusion models and has shown strong performance in video generation. However, (i) SLA relies on a heuristic split that assigns computations to the sparse or…

Machine Learning · Computer Science 2026-02-16 Jintao Zhang , Haoxu Wang , Kai Jiang , Kaiwen Zheng , Youhe Jiang , Ion Stoica , Jianfei Chen , Jun Zhu , Joseph E. Gonzalez

Attention serves as the fundamental mechanism for long-context modeling in large language models (LLMs), yet dense attention becomes structurally prohibitive for long sequences due to its quadratic complexity. Consequently, sparse attention…

Computation and Language · Computer Science 2026-01-07 Junxiang Qiu , Shuo Wang , Zhengsu Chen , Hengheng Zhang , Jinda Lu , Changcheng Li , Qi Tian

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving…

The quadratic cost of attention limits the scalability of long-context LLMs, especially under limited hardware memory budgets. While attention is often sparse, existing static sparse methods cannot adapt to task- or input-dependent…

Computation and Language · Computer Science 2026-05-29 Siheng Xiong , Joe Zou , Faramarz Fekri , Yae Jee Cho

The quadratic computational complexity of standard attention mechanisms presents a severe scalability bottleneck for LLMs in long-context scenarios. While hybrid attention mechanisms combining Full Attention (FA) and Sparse Attention (SA)…

Machine Learning · Computer Science 2026-04-10 Quantong Qiu , Zhiyi Hong , Yi Yang , Haitian Wang , Kebin Liu , Qingqing Dang , Juntao Li , Min Zhang

The design of Large Language Models (LLMs) has long been hampered by a fundamental conflict within their core attention mechanism: its remarkable expressivity is built upon a computational complexity of O(H N^2) that grows quadratically…

Machine Learning · Computer Science 2025-12-01 Mingkuan Zhao , Wentao Hu , Jiayin Wang , Xin Lai , Tianchen Huang , Yuheng Min , Rui Yan , Xiaoyan Zhu

The quadratic computational complexity of MultiHead SelfAttention (MHSA) remains a fundamental bottleneck in scaling Large Language Models (LLMs) for longcontext tasks. While sparse and linearized attention mechanisms attempt to mitigate…

Computation and Language · Computer Science 2025-12-19 Caner Erden

The computational demands of self-attention mechanisms pose a critical challenge for transformer-based video generation, particularly in synthesizing ultra-long sequences. Current approaches, such as factorized attention and fixed sparse…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Qirui Li , Guangcong Zheng , Qi Zhao , Jie Li , Bin Dong , Yiwu Yao , Xi Li

In Diffusion Transformer (DiT) models, particularly for video generation, attention latency is a major bottleneck due to the long sequence length and the quadratic complexity. We find that attention weights can be separated into two parts:…

Recent advance in sparse attention mechanisms has demonstrated strong potential for reducing the computational cost of long-context training and inference in large language models (LLMs). Native Sparse Attention (NSA), one state-of-the-art…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-14 Ran Yan , Youhe Jiang , Zhuoming Chen , Haohui Mai , Beidi Chen , Binhang Yuan

Long-term memory is a cornerstone of human intelligence. Enabling AI to process lifetime-scale information remains a long-standing pursuit in the field. Due to the constraints of full-attention architectures, the effective context length of…

Computation and Language · Computer Science 2026-04-14 Yu Chen , Runkai Chen , Sheng Yi , Xinda Zhao , Xiaohong Li , Jianjin Zhang , Jun Sun , Chuanrui Hu , Yunyun Han , Lidong Bing , Yafeng Deng , Tianqiao Chen

As Large Language Models (LLMs) scale to longer context windows, the computational cost of attention mechanisms, which traditionally grows quadratically with input length, presents a critical challenge for real-time and memory-constrained…

Computation and Language · Computer Science 2024-12-10 James Vo

Code summarization aims to generate concise natural language descriptions for source code. The prevailing approaches adopt transformer-based encoder-decoder architectures, where the Abstract Syntax Tree (AST) of the source code is utilized…

Computation and Language · Computer Science 2023-08-11 Yeshwanth Nagaraj , Ujjwal Gupta

The Segment Anything Model (SAM) has advanced interactive segmentation but is limited by the high computational cost on high-resolution images. This requires downsampling to meet GPU constraints, sacrificing the fine-grained details needed…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 You Huang , Wenbin Lai , Jiayi Ji , Liujuan Cao , Shengchuan Zhang , Rongrong Ji

Attention-based architectures have achieved superior performance in multivariate time series forecasting but are computationally expensive. Techniques such as patching and adaptive masking have been developed to reduce their sizes and…

Machine Learning · Computer Science 2025-05-14 Suhan Guo , Jiahong Deng , Mengjun Yi , Furao Shen , Jian Zhao

Processing long contexts has become a critical capability for modern large language models (LLMs). However, serving long-context LLMs comes with significant inference costs due to the high memory overhead of the key-value (KV) cache.…

Machine Learning · Computer Science 2025-03-04 Qihui Zhou , Peiqi Yin , Pengfei Zuo , James Cheng

Dense large language models(LLMs) face critical efficiency bottlenecks as they rigidly activate all parameters regardless of input complexity. While existing sparsity methods(static pruning or dynamic activation) address this partially,…

Computation and Language · Computer Science 2025-02-27 Yiheng Yang , Yujie Wang , Chi Ma , Lei Yu , Emmanuele Chersoni , Chu-Ren Huang

Efficient long-context understanding and reasoning are increasingly vital for large language model (LLM) applications such as multi-turn dialogue and program analysis. However, the core self-attention mechanism scales quadratically with…

Computation and Language · Computer Science 2025-12-17 Siran Liu , Zane Cao , Yongchao He
‹ Prev 1 2 3 10 Next ›