Related papers: An Iterative Algorithm for Rescaled Hyperbolic Fun…

In-Context Learning for Attention Scheme: from Single Softmax Regression to Multiple Softmax Regression via a Tensor Trick

Large language models (LLMs) have brought significant and transformative changes in human society. These models have demonstrated remarkable capabilities in natural language understanding and generation, leading to various advancements and…

Machine Learning · Computer Science 2023-07-06 Yeqi Gao , Zhao Song , Shenghao Xie

Attention Scheme Inspired Softmax Regression

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

The Closeness of In-Context Learning and Weight Shifting for Softmax Regression

Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human life-related or even job-related tasks. The attention mechanism in the Transformer…

Computation and Language · Computer Science 2023-04-27 Shuai Li , Zhao Song , Yu Xia , Tong Yu , Tianyi Zhou

Local Convergence of Approximate Newton Method for Two Layer Nonlinear Regression

There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation,…

Machine Learning · Computer Science 2023-11-28 Zhihang Li , Zhao Song , Zifan Wang , Junze Yin

Beyond Linear Approximations: A Novel Pruning Approach for Attention Matrix

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model…

Machine Learning · Computer Science 2025-02-27 Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song , Yufa Zhou

A Unified Scheme of ResNet and Softmax

Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical…

Machine Learning · Computer Science 2023-09-26 Zhao Song , Weixin Wang , Junze Yin

LoLCATs: On Low-Rank Linearizing of Large Language Models

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However,…

Machine Learning · Computer Science 2025-03-07 Michael Zhang , Simran Arora , Rahul Chalamala , Alan Wu , Benjamin Spector , Aaryan Singhal , Krithik Ramesh , Christopher Ré

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited…

Computation and Language · Computer Science 2024-07-26 Haoran You , Yichao Fu , Zheng Wang , Amir Yazdanbakhsh , Yingyan Celine Lin

The Expressibility of Polynomial based Attention Scheme

Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. As…

Machine Learning · Computer Science 2023-11-01 Zhao Song , Guangyi Xu , Junze Yin

TransNormerLLM: A Faster and Better Large Language Model with Improved TransNormer

We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear…

Computation and Language · Computer Science 2024-01-22 Zhen Qin , Dong Li , Weigao Sun , Weixuan Sun , Xuyang Shen , Xiaodong Han , Yunshen Wei , Baohong Lv , Xiao Luo , Yu Qiao , Yiran Zhong

How to Protect Copyright Data in Optimization of Large Language Models?

Large language models (LLMs) and generative AI have played a transformative role in computer research and applications. Controversy has arisen as to whether these models output copyrighted data, which can occur if the data the models are…

Machine Learning · Computer Science 2023-08-24 Timothy Chu , Zhao Song , Chiwun Yang

Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models

Large language models have achieved remarkable success in recent years, primarily due to self-attention. However, traditional Softmax attention suffers from numerical instability and reduced performance as the number of inference tokens…

Computation and Language · Computer Science 2026-02-02 Bo Gao , Michael W. Spratling , Letizia Gionfrida

Lizard: An Efficient Linearization Framework for Large Language Models

We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due…

Computation and Language · Computer Science 2026-04-21 Chien Van Nguyen , Huy Nguyen , Ruiyi Zhang , Hanieh Deilamsalehy , Puneet Mathur , Viet Dac Lai , Haoliang Wang , Jayakumar Subramanian , Ryan A. Rossi , Trung Bui , Nikos Vlassis , Franck Dernoncourt , Thien Huu Nguyen

Efficient Inference Using Large Language Models with Limited Human Data: Fine-Tuning then Rectification

Driven by recent advances in artificial intelligence (AI), a growing literature has demonstrated the potential for using large language models (LLMs) as scalable surrogates to generate human-like responses in many business applications. Two…

Machine Learning · Computer Science 2025-12-30 Lei Wang , Zikun Ye , Jinglong Zhao

LASER: Attention with Exponential Transformation

Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention. This mechanism plays a crucial role in Transformer's…

Machine Learning · Computer Science 2025-07-15 Sai Surya Duvvuri , Inderjit S. Dhillon

LLMs as High-Dimensional Nonlinear Autoregressive Models with Attention: Training, Alignment and Inference

Large language models (LLMs) based on transformer architectures are typically described through collections of architectural components and training procedures, obscuring their underlying computational structure. This review article…

Machine Learning · Computer Science 2026-02-03 Vikram Krishnamurthy

Self-Attention as a Covariance Readout: A Unified View of In-Context Learning and Repetition

Large language models (LLMs) exhibit two striking and ostensibly unrelated behaviours: in-context learning (ICL) and repetitive generation. In both, the model behaves as though it had summarised the context into a population-level statistic…

Machine Learning · Computer Science 2026-05-12 Haoren Xu , Guanhua Fang

High-Layer Attention Pruning with Rescaling

Pruning is a highly effective approach for compressing large language models (LLMs), significantly reducing inference latency. However, conventional training-free structured pruning methods often employ a heuristic metric that…

Computation and Language · Computer Science 2026-01-28 Songtao Liu , Peng Liu

ReGLA: Refining Gated Linear Attention

Recent advancements in Large Language Models (LLMs) have set themselves apart with their exceptional performance in complex language modelling tasks. However, these models are also known for their significant computational and storage…

Computation and Language · Computer Science 2025-08-12 Peng Lu , Ivan Kobyzev , Mehdi Rezagholizadeh , Boxing Chen , Philippe Langlais

Refining Answer Distributions for Improved Large Language Model Reasoning

Large Language Models (LLMs) have exhibited an impressive capability to perform reasoning tasks, especially if they are encouraged to generate a sequence of intermediate steps. Reasoning performance can be improved by suitably combining…

Computation and Language · Computer Science 2025-04-11 Soumyasundar Pal , Didier Chételat , Yingxue Zhang , Mark Coates