English
Related papers

Related papers: An Iterative Algorithm for Rescaled Hyperbolic Fun…

200 papers

Large language models (LLMs) have brought significant and transformative changes in human society. These models have demonstrated remarkable capabilities in natural language understanding and generation, leading to various advancements and…

Machine Learning · Computer Science 2023-07-06 Yeqi Gao , Zhao Song , Shenghao Xie

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

Large language models (LLMs) are known for their exceptional performance in natural language processing, making them highly effective in many human life-related or even job-related tasks. The attention mechanism in the Transformer…

Computation and Language · Computer Science 2023-04-27 Shuai Li , Zhao Song , Yu Xia , Tong Yu , Tianyi Zhou

There have been significant advancements made by large language models (LLMs) in various aspects of our daily lives. LLMs serve as a transformative force in natural language processing, finding applications in text generation, translation,…

Machine Learning · Computer Science 2023-11-28 Zhihang Li , Zhao Song , Zifan Wang , Junze Yin

Large Language Models (LLMs) have shown immense potential in enhancing various aspects of our daily lives, from conversational AI to search and AI assistants. However, their growing capabilities come at the cost of extremely large model…

Machine Learning · Computer Science 2025-02-27 Yingyu Liang , Jiangxuan Long , Zhenmei Shi , Zhao Song , Yufa Zhou

Large language models (LLMs) have brought significant changes to human society. Softmax regression and residual neural networks (ResNet) are two important techniques in deep learning: they not only serve as significant theoretical…

Machine Learning · Computer Science 2023-09-26 Zhao Song , Weixin Wang , Junze Yin

Recent works show we can linearize large language models (LLMs) -- swapping the quadratic attentions of popular Transformer-based LLMs with subquadratic analogs, such as linear attention -- avoiding the expensive pretraining costs. However,…

Machine Learning · Computer Science 2025-03-07 Michael Zhang , Simran Arora , Rahul Chalamala , Alan Wu , Benjamin Spector , Aaryan Singhal , Krithik Ramesh , Christopher Ré

Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited…

Computation and Language · Computer Science 2024-07-26 Haoran You , Yichao Fu , Zheng Wang , Amir Yazdanbakhsh , Yingyan Celine Lin

Large language models (LLMs) have significantly improved various aspects of our daily lives. These models have impacted numerous domains, from healthcare to education, enhancing productivity, decision-making processes, and accessibility. As…

Machine Learning · Computer Science 2023-11-01 Zhao Song , Guangyi Xu , Junze Yin

We present TransNormerLLM, the first linear attention-based Large Language Model (LLM) that outperforms conventional softmax attention-based models in terms of both accuracy and efficiency. TransNormerLLM evolves from the previous linear…

Computation and Language · Computer Science 2024-01-22 Zhen Qin , Dong Li , Weigao Sun , Weixuan Sun , Xuyang Shen , Xiaodong Han , Yunshen Wei , Baohong Lv , Xiao Luo , Yu Qiao , Yiran Zhong

Large language models (LLMs) and generative AI have played a transformative role in computer research and applications. Controversy has arisen as to whether these models output copyrighted data, which can occur if the data the models are…

Machine Learning · Computer Science 2023-08-24 Timothy Chu , Zhao Song , Chiwun Yang

Large language models have achieved remarkable success in recent years, primarily due to self-attention. However, traditional Softmax attention suffers from numerical instability and reduced performance as the number of inference tokens…

Computation and Language · Computer Science 2026-02-02 Bo Gao , Michael W. Spratling , Letizia Gionfrida

We propose Lizard, a linearization framework that transforms pretrained Transformer-based Large Language Models (LLMs) into subquadratic architectures. Transformers faces severe computational and memory bottlenecks with long sequences due…

Driven by recent advances in artificial intelligence (AI), a growing literature has demonstrated the potential for using large language models (LLMs) as scalable surrogates to generate human-like responses in many business applications. Two…

Machine Learning · Computer Science 2025-12-30 Lei Wang , Zikun Ye , Jinglong Zhao

Transformers have had tremendous impact for several sequence related tasks, largely due to their ability to retrieve from any part of the sequence via softmax based dot-product attention. This mechanism plays a crucial role in Transformer's…

Machine Learning · Computer Science 2025-07-15 Sai Surya Duvvuri , Inderjit S. Dhillon

Large language models (LLMs) based on transformer architectures are typically described through collections of architectural components and training procedures, obscuring their underlying computational structure. This review article…

Machine Learning · Computer Science 2026-02-03 Vikram Krishnamurthy

Large language models (LLMs) exhibit two striking and ostensibly unrelated behaviours: in-context learning (ICL) and repetitive generation. In both, the model behaves as though it had summarised the context into a population-level statistic…

Machine Learning · Computer Science 2026-05-12 Haoren Xu , Guanhua Fang

Pruning is a highly effective approach for compressing large language models (LLMs), significantly reducing inference latency. However, conventional training-free structured pruning methods often employ a heuristic metric that…

Computation and Language · Computer Science 2026-01-28 Songtao Liu , Peng Liu

Recent advancements in Large Language Models (LLMs) have set themselves apart with their exceptional performance in complex language modelling tasks. However, these models are also known for their significant computational and storage…

Computation and Language · Computer Science 2025-08-12 Peng Lu , Ivan Kobyzev , Mehdi Rezagholizadeh , Boxing Chen , Philippe Langlais

Large Language Models (LLMs) have exhibited an impressive capability to perform reasoning tasks, especially if they are encouraged to generate a sequence of intermediate steps. Reasoning performance can be improved by suitably combining…

Computation and Language · Computer Science 2025-04-11 Soumyasundar Pal , Didier Chételat , Yingxue Zhang , Mark Coates
‹ Prev 1 2 3 10 Next ›