Related papers: Conformer LLMs -- Convolution Augmented Large Lang…

Branchformer: Parallel MLP-Attention Architectures to Capture Local and Global Context for Speech Recognition and Understanding

Conformer has proven to be effective in many speech processing tasks. It combines the benefits of extracting local dependencies using convolutions and global dependencies using self-attention. Inspired by this, we propose a more flexible,…

Computation and Language · Computer Science 2022-07-08 Yifan Peng , Siddharth Dalmia , Ian Lane , Shinji Watanabe

Conformer: Convolution-augmented Transformer for Speech Recognition

Recently Transformer and Convolution neural network (CNN) based models have shown promising results in Automatic Speech Recognition (ASR), outperforming Recurrent neural networks (RNNs). Transformer models are good at capturing…

Audio and Speech Processing · Electrical Eng. & Systems 2020-05-19 Anmol Gulati , James Qin , Chung-Cheng Chiu , Niki Parmar , Yu Zhang , Jiahui Yu , Wei Han , Shibo Wang , Zhengdong Zhang , Yonghui Wu , Ruoming Pang

Multi-Convformer: Extending Conformer with Multiple Convolution Kernels

Convolutions have become essential in state-of-the-art end-to-end Automatic Speech Recognition~(ASR) systems due to their efficient modelling of local context. Notably, its use in Conformers has led to superior performance compared to…

Computation and Language · Computer Science 2024-07-25 Darshan Prabhu , Yifan Peng , Preethi Jyothi , Shinji Watanabe

Advancing Transformer Architecture in Long-Context Large Language Models: A Comprehensive Survey

Transformer-based Large Language Models (LLMs) have been applied in diverse areas such as knowledge bases, human interfaces, and dynamic agents, and marking a stride towards achieving Artificial General Intelligence (AGI). However, current…

Computation and Language · Computer Science 2024-02-27 Yunpeng Huang , Jingwei Xu , Junyu Lai , Zixu Jiang , Taolue Chen , Zenan Li , Yuan Yao , Xiaoxing Ma , Lijuan Yang , Hao Chen , Shupeng Li , Penghao Zhao

To Transformers and Beyond: Large Language Models for the Genome

In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based…

Genomics · Quantitative Biology 2023-11-15 Micaela E. Consens , Cameron Dufault , Michael Wainberg , Duncan Forster , Mehran Karimzadeh , Hani Goodarzi , Fabian J. Theis , Alan Moses , Bo Wang

Speed Always Wins: A Survey on Efficient Architectures for Large Language Models

Large Language Models (LLMs) have delivered impressive results in language understanding, generation, reasoning, and pushes the ability boundary of multimodal models. Transformer models, as the foundation of modern LLMs, offer a strong…

Computation and Language · Computer Science 2025-08-14 Weigao Sun , Jiaxi Hu , Yucheng Zhou , Jusen Du , Disen Lan , Kexin Wang , Tong Zhu , Xiaoye Qu , Yu Zhang , Xiaoyu Mo , Daizong Liu , Yuxuan Liang , Wenliang Chen , Guoqi Li , Yu Cheng

Moving Beyond Next-Token Prediction: Transformers are Context-Sensitive Language Generators

Large Language Models (LLMs), powered by Transformers, have demonstrated human-like intelligence capabilities, yet their underlying mechanisms remain poorly understood. This paper presents a novel framework for interpreting LLMs as…

Computation and Language · Computer Science 2025-04-16 Phill Kyu Rhee

A Meta-Learning Perspective on Transformers for Causal Language Modeling

The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view…

Machine Learning · Computer Science 2024-03-26 Xinbo Wu , Lav R. Varshney

A Survey on Large Language Models from Concept to Implementation

Recent advancements in Large Language Models (LLMs), particularly those built on Transformer architectures, have significantly broadened the scope of natural language processing (NLP) applications, transcending their initial use in chatbot…

Computation and Language · Computer Science 2024-05-29 Chen Wang , Jin Zhao , Jiaqi Gong

Survey of different Large Language Model Architectures: Trends, Benchmarks, and Challenges

Large Language Models (LLMs) represent a class of deep learning models adept at understanding natural language and generating coherent responses to various prompts or queries. These models far exceed the complexity of conventional neural…

Machine Learning · Computer Science 2024-12-05 Minghao Shao , Abdul Basit , Ramesh Karri , Muhammad Shafique

Adaptive Large Language Models By Layerwise Attention Shortcuts

Transformer architectures are the backbone of the modern AI revolution. However, they are based on simply stacking the same blocks in dozens of layers and processing information sequentially from one block to another. In this paper, we…

Computation and Language · Computer Science 2024-12-24 Prateek Verma , Mert Pilanci

Condenser: a Pre-training Architecture for Dense Retrieval

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

Transformer-based Causal Language Models Perform Clustering

Even though large language models (LLMs) have demonstrated remarkable capability in solving various natural language tasks, the capability of an LLM to follow human instructions is still a concern. Recent works have shown great improvements…

Computation and Language · Computer Science 2024-03-05 Xinbo Wu , Lav R. Varshney

DEFORMER: Coupling Deformed Localized Patterns with Global Context for Robust End-to-end Speech Recognition

Convolutional neural networks (CNN) have improved speech recognition performance greatly by exploiting localized time-frequency patterns. But these patterns are assumed to appear in symmetric and rigid kernels by the conventional CNN…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-19 Jiamin Xie , John H. L. Hansen

KunlunBaize: LLM with Multi-Scale Convolution and Multi-Token Prediction Under TransformerX Framework

Large language models have demonstrated remarkable performance across various tasks, yet they face challenges such as low computational efficiency, gradient vanishing, and difficulties in capturing complex feature interactions. To address…

Computation and Language · Computer Science 2025-03-21 Cheng Li , Jiexiong Liu , Yixuan Chen , Yanqin Jia , Zhepeng Li

A review on the use of large language models as virtual tutors

Transformer architectures contribute to managing long-term dependencies for Natural Language Processing, representing one of the most recent changes in the field. These architectures are the basis of the innovative, cutting-edge Large…

Computation and Language · Computer Science 2024-09-06 Silvia García-Méndez , Francisco de Arriba-Pérez , María del Carmen Somoza-López

Transformers for Supervised Online Continual Learning

Transformers have become the dominant architecture for sequence modeling tasks such as natural language processing or audio processing, and they are now even considered for tasks that are not naturally sequential such as image…

Machine Learning · Computer Science 2024-03-05 Jorg Bornschein , Yazhe Li , Amal Rannen-Triki

A Language Model With Million Context Length For Raw Audio

Modeling long-term dependencies for audio signals is a particularly challenging problem, as even small-time scales yield on the order of a hundred thousand samples. With the recent advent of Transformers, neural architectures became good at…

Sound · Computer Science 2024-12-24 Prateek Verma

A Survey on Transformer Context Extension: Approaches and Evaluation

Large language models (LLMs) based on Transformer have been widely applied in the filed of natural language processing (NLP), demonstrating strong performance, particularly in handling short text tasks. However, when it comes to long…

Computation and Language · Computer Science 2025-07-09 Yijun Liu , Jinzheng Yu , Yang Xu , Zhongyang Li , Qingfu Zhu

Tracing Multilingual Representations in LLMs with Cross-Layer Transcoders

Multilingual Large Language Models (LLMs) can process many languages, yet how they internally represent this diversity remains unclear. Do they form shared multilingual representations with language-specific decoding, and if so, why does…

Computation and Language · Computer Science 2026-02-10 Abir Harrasse , Florent Draye , Punya Syon Pandey , Zhijing Jin , Bernhard Schölkopf