Related papers: Efficient Sample-Specific Encoder Perturbations

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network…

Machine Learning · Computer Science 2025-10-28 Marianne Arriola , Yair Schiff , Hao Phung , Aaron Gokaslan , Volodymyr Kuleshov

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models

Pre-trained encoder-decoder transformer architectures have become increasingly popular recently with the advent of T5 models. T5 has also become more favorable over other architectures like BERT due to the amount of data that it is…

Computation and Language · Computer Science 2022-10-25 Frederick Liu , Terry Huang , Shihang Lyu , Siamak Shakeri , Hongkun Yu , Jing Li

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

Encoder-Decoder Model for Suffix Prediction in Predictive Monitoring

Predictive monitoring is a subfield of process mining that aims to predict how a running case will unfold in the future. One of its main challenges is forecasting the sequence of activities that will occur from a given point in time --…

Machine Learning · Computer Science 2022-11-30 Efrén Rama-Maneiro , Pablo Monteagudo-Lago , Juan C. Vidal , Manuel Lama

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Latency Adjustable Transformer Encoder for Language Understanding

Adjusting the latency, power, and accuracy of natural language understanding models is a desirable objective of an efficient architecture. This paper proposes an efficient Transformer architecture that adjusts the inference computational…

Computation and Language · Computer Science 2024-09-20 Sajjad Kachuee , Mohammad Sharifkhani

An Efficient Transformer Decoder with Compressed Sub-layers

The large attention-based encoder-decoder network (Transformer) has become prevailing recently due to its effectiveness. But the high computation complexity of its decoder raises the inefficiency issue. By examining the mathematic…

Computation and Language · Computer Science 2023-05-12 Yanyang Li , Ye Lin , Tong Xiao , Jingbo Zhu

Linguistic Knowledge Can Enhance Encoder-Decoder Models (If You Let It)

In this paper, we explore the impact of augmenting pre-trained Encoder-Decoder models, specifically T5, with linguistic knowledge for the prediction of a target task. In particular, we investigate whether fine-tuning a T5 model on an…

Computation and Language · Computer Science 2024-02-28 Alessio Miaschi , Felice Dell'Orletta , Giulia Venturi

Proxy Compression for Language Modeling

Modern language models are trained almost exclusively on token sequences produced by a fixed tokenizer, an external lossless compressor often over UTF-8 byte sequences, thereby coupling the model to that compressor. This work introduces…

Computation and Language · Computer Science 2026-05-15 Lin Zheng , Xinyu Li , Qian Liu , Xiachong Feng , Lingpeng Kong

Dynamic Encoder Transducer: A Flexible Solution For Trading Off Accuracy For Latency

We propose a dynamic encoder transducer (DET) for on-device speech recognition. One DET model scales to multiple devices with different computation capacities without retraining or finetuning. To trading off accuracy and latency, DET…

Computation and Language · Computer Science 2021-04-07 Yangyang Shi , Varun Nagaraja , Chunyang Wu , Jay Mahadeokar , Duc Le , Rohit Prabhavalkar , Alex Xiao , Ching-Feng Yeh , Julian Chan , Christian Fuegen , Ozlem Kalinli , Michael L. Seltzer

Improving Fast-slow Encoder based Transducer with Streaming Deliberation

This paper introduces a fast-slow encoder based transducer with streaming deliberation for end-to-end automatic speech recognition. We aim to improve the recognition accuracy of the fast-slow encoder based transducer while keeping its…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-16 Ke Li , Jay Mahadeokar , Jinxi Guo , Yangyang Shi , Gil Keren , Ozlem Kalinli , Michael L. Seltzer , Duc Le

Improved Variational Inference in Discrete VAEs using Error Correcting Codes

Despite advances in deep probabilistic models, learning discrete latent representations remains challenging. This work introduces a novel method to improve inference in discrete Variational Autoencoders by reframing the inference problem…

Machine Learning · Computer Science 2025-06-11 María Martínez-García , Grace Villacrés , David Mitchell , Pablo M. Olmos

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places. Rare words often have non-trivial pronunciations,…

Audio and Speech Processing · Electrical Eng. & Systems 2024-02-09 Rahul Pandey , Roger Ren , Qi Luo , Jing Liu , Ariya Rastrow , Ankur Gandhe , Denis Filimonov , Grant Strimel , Andreas Stolcke , Ivan Bulyko

Discrete Autoencoders for Sequence Models

Recurrent models for sequences have been recently successful at many tasks, especially for language modeling and machine translation. Nevertheless, it remains challenging to extract good representations from these models. For instance, even…

Machine Learning · Computer Science 2018-01-31 Łukasz Kaiser , Samy Bengio

WhisperRT -- Turning Whisper into a Causal Streaming Model

Automatic Speech Recognition (ASR) has seen remarkable progress, with models like OpenAI Whisper and NVIDIA Canary achieving state-of-the-art (SOTA) performance in offline transcription. However, these models are not designed for streaming…

Computation and Language · Computer Science 2026-04-07 Tomer Krichli , Bhiksha Raj , Joseph Keshet

The Diffusion Encoder

We construct a new kind of encoder, leveraging the expressive power of diffusion models. In a traditional variational autoencoder, the encoder and decoder jointly negotiate a latent representation of the input. This is made possible by the…

Machine Learning · Computer Science 2026-05-14 Akhil Premkumar , Sarah Lucioni

Faster Diffusion: Rethinking the Role of the Encoder for Diffusion Model Inference

One of the main drawback of diffusion models is the slow inference time for image generation. Among the most successful approaches to addressing this problem are distillation methods. However, these methods require considerable…

Computer Vision and Pattern Recognition · Computer Science 2024-10-16 Senmao Li , Taihang Hu , Joost van de Weijer , Fahad Shahbaz Khan , Tao Liu , Linxuan Li , Shiqi Yang , Yaxing Wang , Ming-Ming Cheng , Jian Yang

Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision

This paper presents the Ensemble Nucleotide Byte-level Encoder-Decoder (ENBED) foundation model, analyzing DNA sequences at byte-level precision with an encoder-decoder Transformer architecture. ENBED uses a sub-quadratic implementation of…

Machine Learning · Computer Science 2024-08-26 Aditya Malusare , Harish Kothandaraman , Dipesh Tamboli , Nadia A. Lanman , Vaneet Aggarwal

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Shubham Toshniwal , Anjuli Kannan , Chung-Cheng Chiu , Yonghui Wu , Tara N Sainath , Karen Livescu