Related papers: Block-Based Double Decoders

Encoder-Decoder Diffusion Language Models for Efficient Training and Inference

Discrete diffusion models enable parallel token sampling for faster inference than autoregressive approaches. However, prior diffusion models use a decoder-only architecture, which requires sampling algorithms that invoke the full network…

Machine Learning · Computer Science 2025-10-28 Marianne Arriola , Yair Schiff , Hao Phung , Aaron Gokaslan , Volodymyr Kuleshov

Learning Linear Block Error Correction Codes

Error correction codes are a crucial part of the physical communication layer, ensuring the reliable transfer of data over noisy channels. The design of optimal linear block codes capable of being efficiently decoded is of major concern,…

Information Theory · Computer Science 2024-05-08 Yoni Choukroun , Lior Wolf

Poly-encoders: Transformer Architectures and Pre-training Strategies for Fast and Accurate Multi-sentence Scoring

The use of deep pre-trained bidirectional transformers has led to remarkable progress in a number of applications (Devlin et al., 2018). For tasks that make pairwise comparisons between sequences, matching a given input with a corresponding…

Computation and Language · Computer Science 2020-03-27 Samuel Humeau , Kurt Shuster , Marie-Anne Lachaux , Jason Weston

Block Transformer: Global-to-Local Language Modeling for Fast Inference

We introduce the Block Transformer which adopts hierarchical global-to-local modeling to autoregressive transformers to mitigate the inference bottlenecks associated with self-attention. Self-attention requires the key-value (KV) cache of…

Computation and Language · Computer Science 2024-11-04 Namgyu Ho , Sangmin Bae , Taehyeon Kim , Hyunjik Jo , Yireun Kim , Tal Schuster , Adam Fisch , James Thorne , Se-Young Yun

Attention or Convolution: Transformer Encoders in Audio Language Models for Inference Efficiency

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel

Decoding quantum low density parity check codes with diffusion

An efficient decoder is essential for quantum error correction, and data-driven neural decoders have emerged as promising, flexible solutions. Here, we introduce a diffusion model framework to infer logical errors from syndrome measurements…

Quantum Physics · Physics 2025-09-29 Zejun Liu , Anqi Gong , Bryan K. Clark

Pervasive Attention: 2D Convolutional Neural Networks for Sequence-to-Sequence Prediction

Current state-of-the-art machine translation systems are based on encoder-decoder architectures, that first encode the input sequence, and then generate an output sequence based on the input encoding. Both are interfaced with an attention…

Computation and Language · Computer Science 2018-11-02 Maha Elbayad , Laurent Besacier , Jakob Verbeek

Distilled Dual-Encoder Model for Vision-Language Understanding

We propose a cross-modal attention distillation framework to train a dual-encoder model for vision-language understanding tasks, such as visual reasoning and visual question answering. Dual-encoder models have a faster inference speed than…

Computation and Language · Computer Science 2022-10-18 Zekun Wang , Wenhui Wang , Haichao Zhu , Ming Liu , Bing Qin , Furu Wei

Learning Linear Block Codes with Gradient Quantization

This study investigates the problem of learning linear block codes optimized for Belief-Propagation decoders significantly improving performance compared to the state-of-the-art. Our previous research is extended with an enhanced system…

Signal Processing · Electrical Eng. & Systems 2025-10-02 Louis-Adrien Dufrène , Quentin Lampin , Guillaume Larue

Efficient Attention using a Fixed-Size Memory Representation

The standard content-based attention mechanism typically used in sequence-to-sequence models is computationally expensive as it requires the comparison of large encoder and decoder states at each time step. In this work, we propose an…

Computation and Language · Computer Science 2017-07-04 Denny Britz , Melody Y. Guan , Minh-Thang Luong

Understanding How Encoder-Decoder Architectures Attend

Encoder-decoder networks with attention have proven to be a powerful way to solve many sequence-to-sequence tasks. In these networks, attention aligns encoder and decoder states and is often used for visualizing network behavior. However,…

Machine Learning · Computer Science 2021-10-29 Kyle Aitken , Vinay V Ramasesh , Yuan Cao , Niru Maheswaranathan

Dual Learning-based Video Coding with Inception Dense Blocks

In this paper, a dual learning-based method in intra coding is introduced for PCS Grand Challenge. This method is mainly composed of two parts: intra prediction and reconstruction filtering. They use different network structures, the neural…

Image and Video Processing · Electrical Eng. & Systems 2019-11-25 Chao Liu , Heming Sun , Junan Chen , Zhengxue Cheng , Masaru Takeuchi , Jiro Katto , Xiaoyang Zeng , Yibo Fan

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

While large language models are primarily used on natural language tasks, they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Most proposed approaches for such cross-modal…

Machine Learning · Computer Science 2026-03-09 Paloma García-de-Herreros , Philipp Slusallek , Dietrich Klakow , Vagrant Gautam

How to Mask in Error Correction Code Transformer: Systematic and Double Masking

In communication and storage systems, error correction codes (ECCs) are pivotal in ensuring data reliability. As deep learning's applicability has broadened across diverse domains, there is a growing research focus on neural network-based…

Machine Learning · Computer Science 2023-08-28 Seong-Joon Park , Hee-Youl Kwak , Sang-Hyo Kim , Sunghwan Kim , Yongjune Kim , Jong-Seon No

Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement

Current speech enhancement (SE) research has largely neglected channel attention and spatial attention, and encoder-decoder architecture-based networks have not adequately considered how to provide efficient inputs to the intermediate…

Audio and Speech Processing · Electrical Eng. & Systems 2023-06-12 Junyu Wang

Hybrid Decoding: Rapid Pass and Selective Detailed Correction for Sequence Models

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical…

Logic in Computer Science · Computer Science 2026-05-11 Veeti Ahvonen , Damian Heiman , Antti Kuusisto , Miguel Moreno , Matias Selin

Training and Inference Efficiency of Encoder-Decoder Speech Models

Attention encoder-decoder model architecture is the backbone of several recent top performing foundation speech models: Whisper, Seamless, OWSM, and Canary-1B. However, the reported data and compute requirements for their training are…

Computation and Language · Computer Science 2025-03-21 Piotr Żelasko , Kunal Dhawan , Daniel Galvez , Krishna C. Puvvada , Ankita Pasad , Nithin Rao Koluguri , Ke Hu , Vitaly Lavrukhin , Jagadeesh Balam , Boris Ginsburg

CATFA-Net: A Trans-Convolutional Approach for Accurate Medical Image Segmentation

Convolutional blocks have played a crucial role in advancing medical image segmentation by excelling in dense prediction tasks. However, their inability to effectively capture long-range dependencies has limited their performance.…

Image and Video Processing · Electrical Eng. & Systems 2026-03-17 Siddhartha Mallick , Aayushman Ghosh , Jayanta Paul , Jaya Sil