Related papers: Time-aware Large Kernel Convolutions

Interpreting and Improving Attention From the Perspective of Large Kernel Convolution

Attention mechanisms have significantly advanced visual models by capturing global context effectively. However, their reliance on large-scale datasets and substantial computational resources poses challenges in data-scarce and…

Computer Vision and Pattern Recognition · Computer Science 2024-12-03 Chenghao Li , Chaoning Zhang , Boheng Zeng , Yi Lu , Pengbo Shi , Qingzi Chen , Jirui Liu , Lingyun Zhu , Yang Yang , Heng Tao Shen

Continuous Autoregressive Language Models

The efficiency of large language models (LLMs) is fundamentally limited by their sequential, token-by-token generation process. We argue that overcoming this bottleneck requires a new design axis for LLM scaling: increasing the semantic…

Computation and Language · Computer Science 2025-11-03 Chenze Shao , Darren Li , Fandong Meng , Jie Zhou

A Convolutional Attention Network for Extreme Summarization of Source Code

Attention mechanisms in neural networks have proved useful for problems in which the input and output do not have fixed dimension. Often there exist features that are locally translation invariant and would be valuable for directing the…

Machine Learning · Computer Science 2016-05-26 Miltiadis Allamanis , Hao Peng , Charles Sutton

Beyond Self-Attention: Deformable Large Kernel Attention for Medical Image Segmentation

Medical image segmentation has seen significant improvements with transformer models, which excel in grasping far-reaching contexts and global contextual information. However, the increasing computational demands of these models,…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Reza Azad , Leon Niggemeier , Michael Huttemann , Amirhossein Kazerouni , Ehsan Khodapanah Aghdam , Yury Velichko , Ulas Bagci , Dorit Merhof

Learning Task Representations from In-Context Learning

Large language models (LLMs) have demonstrated remarkable proficiency in in-context learning (ICL), where models adapt to new tasks through example-based prompts without requiring parameter updates. However, understanding how tasks are…

Computation and Language · Computer Science 2025-11-11 Baturay Saglam , Xinyang Hu , Zhuoran Yang , Dionysis Kalogerias , Amin Karbasi

PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution

Recently, some large kernel convnets strike back with appealing performance and efficiency. However, given the square complexity of convolution, scaling up kernels can bring about an enormous amount of parameters and the proliferated…

Computer Vision and Pattern Recognition · Computer Science 2024-03-19 Honghao Chen , Xiangxiang Chu , Yongjian Ren , Xin Zhao , Kaiqi Huang

Task-Aware LLM Council with Adaptive Decision Pathways for Decision Support

Large language models (LLMs) have shown strong capabilities across diverse decision-making tasks. However, existing approaches often overlook the specialization differences among available models, treating all LLMs as uniformly applicable…

Artificial Intelligence · Computer Science 2026-02-02 Wei Zhu , Lixing Yu , Hao-Ren Yao , Zhiwen Tang , Kun Yue

Pay Less Attention with Lightweight and Dynamic Convolutions

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight…

Computation and Language · Computer Science 2019-02-26 Felix Wu , Angela Fan , Alexei Baevski , Yann N. Dauphin , Michael Auli

Tactic: Adaptive Sparse Attention with Clustering and Distribution Fitting for Long-Context LLMs

Long-context models are essential for many applications but face inefficiencies in loading large KV caches during decoding. Prior methods enforce fixed token budgets for sparse attention, assuming a set number of tokens can approximate full…

Machine Learning · Computer Science 2025-02-19 Kan Zhu , Tian Tang , Qinyu Xu , Yile Gu , Zhichen Zeng , Rohan Kadekodi , Liangyu Zhao , Ang Li , Arvind Krishnamurthy , Baris Kasikci

LUNA: Linear Universal Neural Attention with Generalization Guarantees

Scaling attention faces a critical bottleneck: the $\mathcal{O}(n^2)$ quadratic computational cost of softmax attention, which limits its application in long-sequence domains. While linear attention mechanisms reduce this cost to…

Machine Learning · Computer Science 2025-12-10 Ashkan Shahbazi , Ping He , Ali Abbasi , Yikun Bai , Xinran Liu , Elaheh Akbari , Darian Salehi , Navid NaderiAlizadeh , Soheil Kolouri

Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers

Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching…

Hardware Architecture · Computer Science 2025-01-15 Rya Sanovar , Srikant Bharadwaj , Renee St. Amant , Victor Rühle , Saravan Rajmohan

Correlation-Aware Select and Merge Attention for Efficient Fine-Tuning and Context Length Extension

Modeling long sequences is crucial for various large-scale models; however, extending existing architectures to handle longer sequences presents significant technical and resource challenges. In this paper, we propose an efficient and…

Computation and Language · Computer Science 2024-10-08 Ning Wang , Zekun Li , Tongxin Bai , Guoqi Li

Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers

The self-attention mechanism is the key to the success of transformers in recent Large Language Models (LLMs). However, the quadratic computational cost $O(n^2)$ in the input sequence length $n$ is a notorious obstacle for further…

Machine Learning · Computer Science 2024-10-17 Yingyu Liang , Heshan Liu , Zhenmei Shi , Zhao Song , Zhuoyan Xu , Junze Yin

Adaptive Attention Span in Computer Vision

Recent developments in Transformers for language modeling have opened new areas of research in computer vision. Results from late 2019 showed vast performance increases in both object detection and recognition when convolutions are replaced…

Computer Vision and Pattern Recognition · Computer Science 2020-04-21 Jerrod Parker , Shakti Kumar , Joe Roussy

Translution: Unifying Self-attention and Convolution for Adaptive and Relative Modeling

When modeling a given type of data, we consider it to involve two key aspects: 1) identifying relevant elements (e.g., image pixels or textual words) to a central element, as in a convolutional receptive field, or to a query element, as in…

Machine Learning · Computer Science 2025-10-14 Hehe Fan , Yi Yang , Mohan Kankanhalli , Fei Wu

Transformer Neural Processes - Kernel Regression

Neural Processes (NPs) are a rapidly evolving class of models designed to directly model the posterior predictive distribution of stochastic processes. Originally developed as a scalable alternative to Gaussian Processes (GPs), which are…

Machine Learning · Computer Science 2026-04-20 Daniel Jenson , Jhonathan Navott , Mengyan Zhang , Makkunda Sharma , Elizaveta Semenova , Seth Flaxman

A-VL: Adaptive Attention for Large Vision-Language Models

The Large Vision-Language Model (LVLM) integrates computer vision and natural language processing techniques, offering substantial application potential. However, these models demand extensive resources during inference. Adaptive attention…

Artificial Intelligence · Computer Science 2025-02-10 Junyang Zhang , Mu Yuan , Ruiguang Zhong , Puhan Luo , Huiyou Zhan , Ningkang Zhang , Chengchen Hu , Xiangyang Li

Adapting LLMs to Time Series Forecasting via Temporal Heterogeneity Modeling and Semantic Alignment

Large Language Models (LLMs) have recently demonstrated impressive capabilities in natural language processing due to their strong generalization and sequence modeling capabilities. However, their direct application to time series…

Computation and Language · Computer Science 2025-08-12 Yanru Sun , Emadeldeen Eldele , Zongxia Xie , Yucheng Wang , Wenzhe Niu , Qinghua Hu , Chee Keong Kwoh , Min Wu

Token-Efficient Leverage Learning in Large Language Models

Large Language Models (LLMs) have excelled in various tasks but perform better in high-resource scenarios, which presents challenges in low-resource scenarios. Data scarcity and the inherent difficulty of adapting LLMs to specific tasks…

Computation and Language · Computer Science 2024-04-02 Yuanhao Zeng , Min Wang , Yihang Wang , Yingxia Shao

More ConvNets in the 2020s: Scaling up Kernels Beyond 51x51 using Sparsity

Transformers have quickly shined in the computer vision world since the emergence of Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) seems to be challenged by increasingly effective transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2023-03-07 Shiwei Liu , Tianlong Chen , Xiaohan Chen , Xuxi Chen , Qiao Xiao , Boqian Wu , Tommi Kärkkäinen , Mykola Pechenizkiy , Decebal Mocanu , Zhangyang Wang