Related papers: FastSeq: Make Sequence Generation Faster

LightSeq: A High Performance Inference Library for Transformers

Transformer, BERT and their variants have achieved great success in natural language processing. Since Transformer models are huge in size, serving these models is a challenge for real industrial applications. In this paper, we propose…

Mathematical Software · Computer Science 2021-04-23 Xiaohui Wang , Ying Xiong , Yang Wei , Mingxuan Wang , Lei Li

LightSeq2: Accelerated Training for Transformer-based Models on GPUs

Transformer-based neural models are used in many AI applications. Training these models is expensive, as it takes huge GPU resources and long duration. It is challenging because typical data like sentences have variable lengths, and…

Computation and Language · Computer Science 2022-06-17 Xiaohui Wang , Yang Wei , Ying Xiong , Guyue Huang , Xian Qian , Yufei Ding , Mingxuan Wang , Lei Li

Grammatical Sequence Prediction for Real-Time Neural Semantic Parsing

While sequence-to-sequence (seq2seq) models achieve state-of-the-art performance in many natural language processing tasks, they can be too slow for real-time applications. One performance bottleneck is predicting the most likely next token…

Computation and Language · Computer Science 2019-07-26 Chunyang Xiao , Christoph Teichmann , Konstantine Arkoudas

FastSpeech: Fast, Robust and Controllable Text to Speech

Neural network based end-to-end text to speech (TTS) has significantly improved the quality of synthesized speech. Prominent methods (e.g., Tacotron 2) usually first generate mel-spectrogram from text, and then synthesize speech from the…

Computation and Language · Computer Science 2019-11-21 Yi Ren , Yangjun Ruan , Xu Tan , Tao Qin , Sheng Zhao , Zhou Zhao , Tie-Yan Liu

RecycleGPT: An Autoregressive Language Model with Recyclable Module

Existing large language models have to run K times to generate a sequence of K tokens. In this paper, we present RecycleGPT, a generative language model with fast decoding speed by recycling pre-generated model states without running the…

Computation and Language · Computer Science 2024-05-24 Yufan Jiang , Qiaozhi He , Xiaomin Zhuang , Zhihua Wu , Kunpeng Wang , Wenlai Zhao , Guangwen Yang

SeqDiffuSeq: Text Diffusion with Encoder-Decoder Transformers

Diffusion model, a new generative modelling paradigm, has achieved great success in image, audio, and video generation. However, considering the discrete categorical nature of text, it is not trivial to extend continuous diffusion models to…

Computation and Language · Computer Science 2023-05-23 Hongyi Yuan , Zheng Yuan , Chuanqi Tan , Fei Huang , Songfang Huang

SPEED: Speculative Pipelined Execution for Efficient Decoding

Generative Large Language Models (LLMs) based on the Transformer architecture have recently emerged as a dominant foundation model for a wide range of Natural Language Processing tasks. Nevertheless, their application in real-time scenarios…

Computation and Language · Computer Science 2024-01-04 Coleman Hooper , Sehoon Kim , Hiva Mohammadzadeh , Hasan Genc , Kurt Keutzer , Amir Gholami , Sophia Shao

Towards better decoding and language model integration in sequence to sequence models

The recently proposed Sequence-to-Sequence (seq2seq) framework advocates replacing complex data processing pipelines, such as an entire automatic speech recognition system, with a single neural network trained in an end-to-end fashion. In…

Neural and Evolutionary Computing · Computer Science 2016-12-09 Jan Chorowski , Navdeep Jaitly

FlowSeq: Non-Autoregressive Conditional Sequence Generation with Generative Flow

Most sequence-to-sequence (seq2seq) models are autoregressive; they generate each token by conditioning on previously generated tokens. In contrast, non-autoregressive seq2seq models generate all tokens in one pass, which leads to increased…

Computation and Language · Computer Science 2019-10-10 Xuezhe Ma , Chunting Zhou , Xian Li , Graham Neubig , Eduard Hovy

DiffuSeq: Sequence to Sequence Text Generation with Diffusion Models

Recently, diffusion models have emerged as a new paradigm for generative models. Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is under-explored due to the…

Computation and Language · Computer Science 2023-02-15 Shansan Gong , Mukai Li , Jiangtao Feng , Zhiyong Wu , Lingpeng Kong

XSpecMesh: Quality-Preserving Auto-Regressive Mesh Generation Acceleration via Multi-Head Speculative Decoding

Current auto-regressive models can generate high-quality, topologically precise meshes; however, they necessitate thousands-or even tens of thousands-of next-token predictions during inference, resulting in substantial latency. We introduce…

Graphics · Computer Science 2025-08-07 Dian Chen , Yansong Qu , Xinyang Li , Ming Li , Shengchuan Zhang

SepSeq: A Training-Free Framework for Long Numerical Sequence Processing in LLMs

While transformer-based Large Language Models (LLMs) theoretically support massive context windows, they suffer from severe performance degradation when processing long numerical sequences. We attribute this failure to the attention…

Computation and Language · Computer Science 2026-04-10 Jie Sun , Yu Liu , Lu Han , Qiwen Deng , Xiang Shu , Yang Xiao , Xingyu Lu , Jun Zhou , Pengfei Liu , Lintao Ma , Jiancan Wu , Xiang Wang

FastTrees: Parallel Latent Tree-Induction for Faster Sequence Encoding

Inducing latent tree structures from sequential data is an emerging trend in the NLP research landscape today, largely popularized by recent methods such as Gumbel LSTM and Ordered Neurons (ON-LSTM). This paper proposes FASTTREES, a new…

Computation and Language · Computer Science 2021-11-30 Bill Tuck Weng Pung , Alvin Chan

Unifying Autoregressive and Diffusion-Based Sequence Generation

We present significant extensions to diffusion-based sequence generation models, blurring the line with autoregressive language models. We introduce hyperschedules, which assign distinct noise schedules to individual token positions,…

Machine Learning · Computer Science 2025-10-08 Nima Fathi , Torsten Scholak , Pierre-André Noël

Lane2Seq: Towards Unified Lane Detection via Sequence Generation

In this paper, we present a novel sequence generation-based framework for lane detection, called Lane2Seq. It unifies various lane detection formats by casting lane detection as a sequence generation task. This is different from previous…

Computer Vision and Pattern Recognition · Computer Science 2024-02-28 Kunyang Zhou

Fastformer: Additive Attention Can Be All You Need

Transformer is a powerful model for text understanding. However, it is inefficient due to its quadratic complexity to input sequence length. Although there are many methods on Transformer acceleration, they are still either inefficient on…

Computation and Language · Computer Science 2021-09-07 Chuhan Wu , Fangzhao Wu , Tao Qi , Yongfeng Huang , Xing Xie

Automatic Label Sequence Generation for Prompting Sequence-to-sequence Models

Prompting, which casts downstream applications as language modeling tasks, has shown to be sample efficient compared to standard fine-tuning with pre-trained models. However, one pitfall of prompting is the need of manually-designed…

Computation and Language · Computer Science 2022-09-21 Zichun Yu , Tianyu Gao , Zhengyan Zhang , Yankai Lin , Zhiyuan Liu , Maosong Sun , Jie Zhou

Fast Inference from Transformers via Speculative Decoding

Inference from large autoregressive models like Transformers is slow - decoding K tokens takes K serial runs of the model. In this work we introduce speculative decoding - an algorithm to sample from autoregressive models faster without any…

Machine Learning · Computer Science 2023-05-22 Yaniv Leviathan , Matan Kalman , Yossi Matias

Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation

Recently, simultaneous translation has gathered a lot of attention since it enables compelling applications such as subtitle translation for a live event or real-time video-call translation. Some of these translation applications allow…

Computation and Language · Computer Science 2021-06-03 Hyojung Han , Sathish Indurthi , Mohd Abbas Zaidi , Nikhil Kumar Lakumarapu , Beomseok Lee , Sangha Kim , Chanwoo Kim , Inchul Hwang

FastFormers: Highly Efficient Transformer Models for Natural Language Understanding

Transformer-based models are the state-of-the-art for Natural Language Understanding (NLU) applications. Models are getting bigger and better on various tasks. However, Transformer models remain computationally challenging since they are…

Computation and Language · Computer Science 2020-10-27 Young Jin Kim , Hany Hassan Awadalla