Related papers: Hierarchical Attention Encoder Decoder

Hierarchical Skip Decoding for Efficient Autoregressive Text Generation

Autoregressive decoding strategy is a commonly used method for text generation tasks with pre-trained language models, while early-exiting is an effective approach to speedup the inference stage. In this work, we propose a novel decoding…

Computation and Language · Computer Science 2024-03-25 Yunqi Zhu , Xuebing Yang , Yuanyuan Wu , Wensheng Zhang

A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification

Recent approaches in hierarchical text classification (HTC) rely on the capabilities of a pre-trained transformer model and exploit the label semantics and a graph encoder for the label hierarchy. In this paper, we introduce an effective…

Machine Learning · Computer Science 2025-01-24 Younes Yousef , Lukas Galke , Ansgar Scherp

Learning deep autoregressive models for hierarchical data

We propose a model for hierarchical structured data as an extension to the stochastic temporal convolutional network. The proposed model combines an autoregressive model with a hierarchical variational autoencoder and downsampling to…

Machine Learning · Computer Science 2021-07-02 Carl R. Andersson , Niklas Wahlström , Thomas B. Schön

Deep Recurrent Encoder: A scalable end-to-end network to model brain signals

Understanding how the brain responds to sensory inputs is challenging: brain recordings are partial, noisy, and high dimensional; they vary across sessions and subjects and they capture highly nonlinear dynamics. These challenges have led…

Neurons and Cognition · Quantitative Biology 2022-10-03 Omar Chehab , Alexandre Defossez , Jean-Christophe Loiseau , Alexandre Gramfort , Jean-Remi King

Hierarchical Autoregressive Image Models with Auxiliary Decoders

Autoregressive generative models of images tend to be biased towards capturing local structure, and as a result they often produce samples which are lacking in terms of large-scale coherence. To address this, we propose two methods to learn…

Computer Vision and Pattern Recognition · Computer Science 2019-10-09 Jeffrey De Fauw , Sander Dieleman , Karen Simonyan

Auxiliary Guided Autoregressive Variational Autoencoders

Generative modeling of high-dimensional data is a key problem in machine learning. Successful approaches include latent variable models and autoregressive models. The complementary strengths of these approaches, to model global and local…

Computer Vision and Pattern Recognition · Computer Science 2019-04-19 Thomas Lucas , Jakob Verbeek

Blockwise Parallel Decoding for Deep Autoregressive Models

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make…

Machine Learning · Computer Science 2018-11-09 Mitchell Stern , Noam Shazeer , Jakob Uszkoreit

Encoding-based Memory Modules for Recurrent Neural Networks

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design…

Machine Learning · Computer Science 2020-02-03 Antonio Carta , Alessandro Sperduti , Davide Bacciu

Hybrid Autoregressive Transducer (hat)

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-18 Ehsan Variani , David Rybach , Cyril Allauzen , Michael Riley

Hierarchical Autoregressive Transformers: Combining Byte- and Word-Level Processing for Robust, Adaptable Language Models

Tokenization is a fundamental step in natural language processing, breaking text into units that computational models can process. While learned subword tokenizers have become the de-facto standard, they present challenges such as large…

Computation and Language · Computer Science 2025-01-22 Pit Neitemeier , Björn Deiseroth , Constantin Eichenberg , Lukas Balles

Fast Inference via Hierarchical Speculative Decoding

Transformer language models generate text autoregressively, making inference latency proportional to the number of tokens generated. Speculative decoding reduces this latency without sacrificing output quality, by leveraging a small draft…

Machine Learning · Computer Science 2025-10-24 Clara Mohri , Haim Kaplan , Tal Schuster , Yishay Mansour , Amir Globerson

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Deep Encoder, Shallow Decoder: Reevaluating Non-autoregressive Machine Translation

Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where…

Computation and Language · Computer Science 2021-06-28 Jungo Kasai , Nikolaos Pappas , Hao Peng , James Cross , Noah A. Smith

Language Modeling with Deep Transformers

We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured…

Computation and Language · Computer Science 2019-09-25 Kazuki Irie , Albert Zeyer , Ralf Schlüter , Hermann Ney

Less is More: Pre-train a Strong Text Encoder for Dense Retrieval Using a Weak Decoder

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space. Autoencoder-based language models are appealing in dense retrieval as they train the encoder to output high-quality…

Machine Learning · Computer Science 2021-09-17 Shuqi Lu , Di He , Chenyan Xiong , Guolin Ke , Waleed Malik , Zhicheng Dou , Paul Bennett , Tieyan Liu , Arnold Overwijk

Masked Non-Autoregressive Image Captioning

Existing captioning models often adopt the encoder-decoder architecture, where the decoder uses autoregressive decoding to generate captions, such that each token is generated sequentially given the preceding generated tokens. However,…

Computer Vision and Pattern Recognition · Computer Science 2019-06-04 Junlong Gao , Xi Meng , Shiqi Wang , Xia Li , Shanshe Wang , Siwei Ma , Wen Gao

Fast Structured Decoding for Sequence Models

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently,…

Machine Learning · Computer Science 2020-01-10 Zhiqing Sun , Zhuohan Li , Haoqing Wang , Zi Lin , Di He , Zhi-Hong Deng

End-to-End Non-Autoregressive Neural Machine Translation with Connectionist Temporal Classification

Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in…

Computation and Language · Computer Science 2018-11-13 Jindřich Libovický , Jindřich Helcl

Fast Decoding in Sequence Models using Discrete Latent Variables

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long…

Machine Learning · Computer Science 2018-06-11 Łukasz Kaiser , Aurko Roy , Ashish Vaswani , Niki Parmar , Samy Bengio , Jakob Uszkoreit , Noam Shazeer

Evaluating Sequence-to-Sequence Models for Handwritten Text Recognition

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end,…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Johannes Michael , Roger Labahn , Tobias Grüning , Jochen Zöllner