Related papers: Adapting Decoder-Based Language Models for Diverse…

Decoding Partial Differential Equations: Cross-Modal Adaptation of Decoder-only Models to PDEs

While large language models are primarily used on natural language tasks, they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Most proposed approaches for such cross-modal…

Machine Learning · Computer Science 2026-03-09 Paloma García-de-Herreros , Philipp Slusallek , Dietrich Klakow , Vagrant Gautam

Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation

While decoder-only large language models (LLMs) have shown impressive results, encoder-decoder models are still widely adopted in real-world applications for their inference efficiency and richer encoder representation. In this paper, we…

Computation and Language · Computer Science 2025-04-09 Biao Zhang , Fedor Moiseev , Joshua Ainslie , Paul Suganthan , Min Ma , Surya Bhupatiraju , Fede Lebron , Orhan Firat , Armand Joulin , Zhe Dong

Scaling Laws of Decoder-Only Models on the Multilingual Machine Translation Task

Recent studies have showcased remarkable capabilities of decoder-only models in many NLP tasks, including translation. Yet, the machine translation field has been largely dominated by encoder-decoder models based on the Transformer…

Computation and Language · Computer Science 2024-09-24 Gaëtan Caillaut , Raheel Qader , Mariam Nakhlé , Jingshu Liu , Jean-Gabriel Barthélemy

Is Encoder-Decoder Redundant for Neural Machine Translation?

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of…

Computation and Language · Computer Science 2022-10-24 Yingbo Gao , Christian Herold , Zijian Yang , Hermann Ney

On decoder-only architecture for speech-to-text and large language model integration

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-03 Jian Wu , Yashesh Gaur , Zhuo Chen , Long Zhou , Yimeng Zhu , Tianrui Wang , Jinyu Li , Shujie Liu , Bo Ren , Linquan Liu , Yu Wu

Encoder vs Decoder: Comparative Analysis of Encoder and Decoder Language Models on Multilingual NLU Tasks

This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages. Building upon the ScandEval benchmark, initially restricted to…

Computation and Language · Computer Science 2025-01-14 Dan Saattrup Nielsen , Kenneth Enevoldsen , Peter Schneider-Kamp

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems…

Computation and Language · Computer Science 2025-06-03 Yingfeng Luo , Tong Zheng , Yongyu Mu , Bei Li , Qinghong Zhang , Yongqi Gao , Ziqiang Xu , Peinan Feng , Xiaoqian Liu , Tong Xiao , Jingbo Zhu

Return of the Encoder: Maximizing Parameter Efficiency for SLMs

The dominance of large decoder-only language models has overshadowed encoder-decoder architectures, despite their fundamental efficiency advantages in sequence processing. For small language models (SLMs) - those with 1 billion parameters…

Computation and Language · Computer Science 2025-01-31 Mohamed Elfeki , Rui Liu , Chad Voegele

On the Sub-Layer Functionalities of Transformer Decoder

There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During…

Computation and Language · Computer Science 2020-10-07 Yilin Yang , Longyue Wang , Shuming Shi , Prasad Tadepalli , Stefan Lee , Zhaopeng Tu

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Multilingual Machine Translation: Closing the Gap between Shared and Language-specific Encoder-Decoders

State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages. In this paper, we propose an alternative approach that is based on language-specific…

Computation and Language · Computer Science 2020-04-15 Carlos Escolano , Marta R. Costa-jussà , José A. R. Fonollosa , Mikel Artetxe

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

Recent large language model (LLM) research has undergone an architectural shift from encoder-decoder modeling to nowadays the dominant decoder-only modeling. This rapid transition, however, comes without a rigorous comparative analysis…

Computation and Language · Computer Science 2025-10-31 Biao Zhang , Yong Cheng , Siamak Shakeri , Xinyi Wang , Min Ma , Orhan Firat

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to…

Computation and Language · Computer Science 2023-04-11 Zihao Fu , Wai Lam , Qian Yu , Anthony Man-Cho So , Shengding Hu , Zhiyuan Liu , Nigel Collier

Language Ranker: A Lightweight Ranking framework for LLM Decoding

Conventional research on large language models (LLMs) has primarily focused on refining output distributions, while paying less attention to the decoding process that transforms these distributions into final responses. Recent advances,…

Computation and Language · Computer Science 2025-10-28 Chenheng Zhang , Tianqi Du , Jizhe Zhang , Mingqing Xiao , Yifei Wang , Yisen Wang , Zhouchen Lin

The Landscape and Challenges of HPC Research and LLMs

Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing…

Machine Learning · Computer Science 2024-02-08 Le Chen , Nesreen K. Ahmed , Akash Dutta , Arijit Bhattacharjee , Sixing Yu , Quazi Ishtiaque Mahmud , Waqwoya Abebe , Hung Phan , Aishwarya Sarkar , Branden Butler , Niranjan Hasabnis , Gal Oren , Vy A. Vo , Juan Pablo Munoz , Theodore L. Willke , Tim Mattson , Ali Jannesari

Decoder-Only LLMs are Better Controllers for Diffusion Models

Groundbreaking advancements in text-to-image generation have recently been achieved with the emergence of diffusion models. These models exhibit a remarkable ability to generate highly artistic and intricately detailed images based on…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Ziyi Dong , Yao Xiao , Pengxu Wei , Liang Lin

Making Language Model a Hierarchical Classifier

Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers…

Computation and Language · Computer Science 2025-09-30 Yihong Wang , Zhonglin Jiang , Ningyuan Xi , Yue Zhao , Qingqing Gu , Xiyuan Chen , Hao Wu , Sheng Xu , Hange Zhou , Yong Chen , Luo Ji

Condenser: a Pre-training Architecture for Dense Retrieval

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized…

Computation and Language · Computer Science 2024-08-23 Parishad BehnamGhader , Vaibhav Adlakha , Marius Mosbach , Dzmitry Bahdanau , Nicolas Chapados , Siva Reddy

A Thorough Examination of Decoding Methods in the Era of LLMs

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam