English
Related papers

Related papers: Efficient Sample-Specific Encoder Perturbations

200 papers

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network…

Machine Learning · Computer Science 2025-10-28 Marianne Arriola , Yair Schiff , Hao Phung , Aaron Gokaslan , Volodymyr Kuleshov

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Pre-trained encoder-decoder transformer architectures have become increasingly popular recently with the advent of T5 models. T5 has also become more favorable over other architectures like BERT due to the amount of data that it is…

Computation and Language · Computer Science 2022-10-25 Frederick Liu , Terry Huang , Shihang Lyu , Siamak Shakeri , Hongkun Yu , Jing Li

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

Predictive monitoring is a subfield of process mining that aims to predict how a running case will unfold in the future. One of its main challenges is forecasting the sequence of activities that will occur from a given point in time --…

Machine Learning · Computer Science 2022-11-30 Efrén Rama-Maneiro , Pablo Monteagudo-Lago , Juan C. Vidal , Manuel Lama

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Adjusting the latency, power, and accuracy of natural language understanding models is a desirable objective of an efficient architecture. This paper proposes an efficient Transformer architecture that adjusts the inference computational…

Computation and Language · Computer Science 2024-09-20 Sajjad Kachuee , Mohammad Sharifkhani

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic…

Computation and Language · Computer Science 2023-05-12 Yanyang Li , Ye Lin , Tong Xiao , Jingbo Zhu

In this paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an…

Computation and Language · Computer Science 2024-02-28 Alessio Miaschi , Felice Dell'Orletta , Giulia Venturi

Modern language models are trained almost exclusively on token sequences produced by a fixed tokenizer, an external lossless compressor often over UTF-8 byte sequences, thereby coupling the model to that compressor. This work introduces…

Computation and Language · Computer Science 2026-05-15 Lin Zheng , Xinyu Li , Qian Liu , Xiachong Feng , Lingpeng Kong

We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET…

This paper introduces a fast-slow encoder based transducer with streaming deliberation for end-to-end automatic speech recognition. We aim to improve the recognition accuracy of the fast-slow encoder based transducer while keeping its…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-16 Ke Li , Jay Mahadeokar , Jinxi Guo , Yangyang Shi , Gil Keren , Ozlem Kalinli , Michael L. Seltzer , Duc Le

Despite advances in deep probabilistic models, learning discrete latent representations remains challenging. This work introduces a novel method to improve inference in discrete Variational Autoencoders by reframing the inference problem…

Machine Learning · Computer Science 2025-06-11 María Martínez-García , Grace Villacrés , David Mitchell , Pablo M. Olmos

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places. Rare words often have non-trivial pronunciations,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-09 Rahul Pandey , Roger Ren , Qi Luo , Jing Liu , Ariya Rastrow , Ankur Gandhe , Denis Filimonov , Grant Strimel , Andreas Stolcke , Ivan Bulyko

Recurrent models for sequences have been recently successful at many tasks, especially for language modeling and machine translation. Nevertheless, it remains challenging to extract good representations from these models. For instance, even…

Machine Learning · Computer Science 2018-01-31 Łukasz Kaiser , Samy Bengio

Automatic Speech Recognition (ASR) has seen remarkable progress, with models like OpenAI Whisper and NVIDIA Canary achieving state-of-the-art (SOTA) performance in offline transcription. However, these models are not designed for streaming…

Computation and Language · Computer Science 2026-04-07 Tomer Krichli , Bhiksha Raj , Joseph Keshet

We construct a new kind of encoder, leveraging the expressive power of diffusion models. In a traditional variational autoencoder, the encoder and decoder jointly negotiate a latent representation of the input. This is made possible by the…

Machine Learning · Computer Science 2026-05-14 Akhil Premkumar , Sarah Lucioni

One of the main drawback of diffusion models is the slow inference time for image generation. Among the most successful approaches to addressing this problem are distillation methods. However, these methods require considerable…

Computer Vision and Pattern Recognition · Computer Science 2024-10-16 Senmao Li , Taihang Hu , Joost van de Weijer , Fahad Shahbaz Khan , Tao Liu , Linxuan Li , Shiqi Yang , Yaxing Wang , Ming-Ming Cheng , Jian Yang

This paper presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a sub-quadratic implementation of…

Machine Learning · Computer Science 2024-08-26 Aditya Malusare , Harish Kothandaraman , Dipesh Tamboli , Nadia A. Lanman , Vaneet Aggarwal

Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Shubham Toshniwal , Anjuli Kannan , Chung-Cheng Chiu , Yonghui Wu , Tara N Sainath , Karen Livescu
‹ Prev 1 2 3 10 Next ›