English
Related papers

Related papers: Hierarchical Attention Encoder Decoder

200 papers

Autoregressive decoding strategy is a commonly used method for text generation tasks with pre-trained language models, while early-exiting is an effective approach to speedup the inference stage. In this work, we propose a novel decoding…

Computation and Language · Computer Science 2024-03-25 Yunqi Zhu , Xuebing Yang , Yuanyuan Wu , Wensheng Zhang

Recent approaches in hierarchical text classification (HTC) rely on the capabilities of a pre-trained transformer model and exploit the label semantics and a graph encoder for the label hierarchy. In this paper, we introduce an effective…

Machine Learning · Computer Science 2025-01-24 Younes Yousef , Lukas Galke , Ansgar Scherp

We propose a model for hierarchical structured data as an extension to the stochastic temporal convolutional network. The proposed model combines an autoregressive model with a hierarchical variational autoencoder and downsampling to…

Machine Learning · Computer Science 2021-07-02 Carl R. Andersson , Niklas Wahlström , Thomas B. Schön

Understanding how the brain responds to sensory inputs is challenging: brain recordings are partial, noisy, and high dimensional; they vary across sessions and subjects and they capture highly nonlinear dynamics. These challenges have led…

Neurons and Cognition · Quantitative Biology 2022-10-03 Omar Chehab , Alexandre Defossez , Jean-Christophe Loiseau , Alexandre Gramfort , Jean-Remi King

Autoregressive generative models of images tend to be biased towards capturing local structure, and as a result they often produce samples which are lacking in terms of large-scale coherence. To address this, we propose two methods to learn…

Computer Vision and Pattern Recognition · Computer Science 2019-10-09 Jeffrey De Fauw , Sander Dieleman , Karen Simonyan

Generative modeling of high-dimensional data is a key problem in machine learning. Successful approaches include latent variable models and autoregressive models. The complementary strengths of these approaches, to model global and local…

Computer Vision and Pattern Recognition · Computer Science 2019-04-19 Thomas Lucas , Jakob Verbeek

Deep autoregressive sequence-to-sequence models have demonstrated impressive performance across a wide variety of tasks in recent years. While common architecture classes such as recurrent, convolutional, and self-attention networks make…

Machine Learning · Computer Science 2018-11-09 Mitchell Stern , Noam Shazeer , Jakob Uszkoreit

Learning to solve sequential tasks with recurrent models requires the ability to memorize long sequences and to extract task-relevant features from them. In this paper, we study the memorization subtask from the point of view of the design…

Machine Learning · Computer Science 2020-02-03 Antonio Carta , Alessandro Sperduti , Davide Bacciu

This paper proposes and evaluates the hybrid autoregressive transducer (HAT) model, a time-synchronous encoderdecoder model that preserves the modularity of conventional automatic speech recognition systems. The HAT model provides a way to…

Audio and Speech Processing · Electrical Eng. & Systems 2020-03-18 Ehsan Variani , David Rybach , Cyril Allauzen , Michael Riley

Tokenization is a fundamental step in natural language processing, breaking text into units that computational models can process. While learned subword tokenizers have become the de-facto standard, they present challenges such as large…

Computation and Language · Computer Science 2025-01-22 Pit Neitemeier , Björn Deiseroth , Constantin Eichenberg , Lukas Balles

Transformer language models generate text autoregressively, making inference latency proportional to the number of tokens generated. Speculative decoding reduces this latency without sacrificing output quality, by leveraging a small draft…

Machine Learning · Computer Science 2025-10-24 Clara Mohri , Haim Kaplan , Tal Schuster , Yishay Mansour , Amir Globerson

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Much recent effort has been invested in non-autoregressive neural machine translation, which appears to be an efficient alternative to state-of-the-art autoregressive machine translation on modern GPUs. In contrast to the latter, where…

Computation and Language · Computer Science 2021-06-28 Jungo Kasai , Nikolaos Pappas , Hao Peng , James Cross , Noah A. Smith

We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured…

Computation and Language · Computer Science 2019-09-25 Kazuki Irie , Albert Zeyer , Ralf Schlüter , Hermann Ney

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space. Autoencoder-based language models are appealing in dense retrieval as they train the encoder to output high-quality…

Machine Learning · Computer Science 2021-09-17 Shuqi Lu , Di He , Chenyan Xiong , Guolin Ke , Waleed Malik , Zhicheng Dou , Paul Bennett , Tieyan Liu , Arnold Overwijk

Existing captioning models often adopt the encoder-decoder architecture, where the decoder uses autoregressive decoding to generate captions, such that each token is generated sequentially given the preceding generated tokens. However,…

Computer Vision and Pattern Recognition · Computer Science 2019-06-04 Junlong Gao , Xi Meng , Shiqi Wang , Xia Li , Shanshe Wang , Siwei Ma , Wen Gao

Autoregressive sequence models achieve state-of-the-art performance in domains like machine translation. However, due to the autoregressive factorization nature, these models suffer from heavy latency during inference. Recently,…

Machine Learning · Computer Science 2020-01-10 Zhiqing Sun , Zhuohan Li , Haoqing Wang , Zi Lin , Di He , Zhi-Hong Deng

Autoregressive decoding is the only part of sequence-to-sequence models that prevents them from massive parallelization at inference time. Non-autoregressive models enable the decoder to generate all output symbols independently in…

Computation and Language · Computer Science 2018-11-13 Jindřich Libovický , Jindřich Helcl

Autoregressive sequence models based on deep neural networks, such as RNNs, Wavenet and the Transformer attain state-of-the-art results on many tasks. However, they are difficult to parallelize and are thus slow at processing long…

Machine Learning · Computer Science 2018-06-11 Łukasz Kaiser , Aurko Roy , Ashish Vaswani , Niki Parmar , Samy Bengio , Jakob Uszkoreit , Noam Shazeer

Encoder-decoder models have become an effective approach for sequence learning tasks like machine translation, image captioning and speech recognition, but have yet to show competitive results for handwritten text recognition. To this end,…

Computer Vision and Pattern Recognition · Computer Science 2019-07-16 Johannes Michael , Roger Labahn , Tobias Grüning , Jochen Zöllner
‹ Prev 1 2 3 10 Next ›