Related papers: Encoder-Decoder or Decoder-Only? Revisiting Encode…

Are Decoder-Only Large Language Models the Silver Bullet for Code Search?

Code search is essential for code reuse, allowing developers to efficiently locate relevant code snippets. The advent of powerful decoder-only Large Language Models (LLMs) has revolutionized many code intelligence tasks. However, their…

Software Engineering · Computer Science 2026-04-23 Yuxuan Chen , Mingwei Liu , Guangsheng Ou , Anji Li , Dekun Dai , Yanlin Wang , Zibin Zheng

Evaluating Small Decoder-Only Language Models for Grammar Correction and Text Simplification

Large language models have become extremely popular recently due to their ability to achieve strong performance on a variety of tasks, such as text generation and rewriting, but their size and computation cost make them difficult to access,…

Computation and Language · Computer Science 2026-01-08 Anthony Lamelas

Investigating Decoder-only Large Language Models for Speech-to-text Translation

Large language models (LLMs), known for their exceptional reasoning capabilities, generalizability, and fluency across diverse domains, present a promising avenue for enhancing speech-related tasks. In this paper, we focus on integrating…

Computation and Language · Computer Science 2024-07-04 Chao-Wei Huang , Hui Lu , Hongyu Gong , Hirofumi Inaguma , Ilia Kulikov , Ruslan Mavlyutov , Sravya Popuri

Looking Right is Sometimes Right: Investigating the Capabilities of Decoder-only LLMs for Sequence Labeling

Pre-trained language models based on masked language modeling (MLM) excel in natural language understanding (NLU) tasks. While fine-tuned MLM-based encoders consistently outperform causal language modeling decoders of comparable size,…

Computation and Language · Computer Science 2024-06-07 David Dukić , Jan Šnajder

Encoder-Decoder Gemma: Improving the Quality-Efficiency Trade-Off via Adaptation

While decoder-only large language models (LLMs) have shown impressive results, encoder-decoder models are still widely adopted in real-world applications for their inference efficiency and richer encoder representation. In this paper, we…

Computation and Language · Computer Science 2025-04-09 Biao Zhang , Fedor Moiseev , Joshua Ainslie , Paul Suganthan , Min Ma , Surya Bhupatiraju , Fede Lebron , Orhan Firat , Armand Joulin , Zhe Dong

Decoder-Only LLMs are Better Controllers for Diffusion Models

Groundbreaking advancements in text-to-image generation have recently been achieved with the emergence of diffusion models. These models exhibit a remarkable ability to generate highly artistic and intricately detailed images based on…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Ziyi Dong , Yao Xiao , Pengxu Wei , Liang Lin

On decoder-only architecture for speech-to-text and large language model integration

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-03 Jian Wu , Yashesh Gaur , Zhuo Chen , Long Zhou , Yimeng Zhu , Tianrui Wang , Jinyu Li , Shujie Liu , Bo Ren , Linquan Liu , Yu Wu

Should We Still Pretrain Encoders with Masked Language Modeling?

Learning high-quality text representations is fundamental to a wide range of NLP tasks. While encoder pretraining has traditionally relied on Masked Language Modeling (MLM), recent evidence suggests that decoder models pretrained with…

Computation and Language · Computer Science 2026-05-06 Hippolyte Gisserot-Boukhlef , Nicolas Boizard , Manuel Faysse , Duarte M. Alves , Emmanuel Malherbe , André F. T. Martins , Céline Hudelot , Pierre Colombo

Beyond Decoder-only: Large Language Models Can be Good Encoders for Machine Translation

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems…

Computation and Language · Computer Science 2025-06-03 Yingfeng Luo , Tong Zheng , Yongyu Mu , Bei Li , Qinghong Zhang , Yongqi Gao , Ziqiang Xu , Peinan Feng , Xiaoqian Liu , Tong Xiao , Jingbo Zhu

Return of the Encoder: Maximizing Parameter Efficiency for SLMs

The dominance of large decoder-only language models has overshadowed encoder-decoder architectures, despite their fundamental efficiency advantages in sequence processing. For small language models (SLMs) - those with 1 billion parameters…

Computation and Language · Computer Science 2025-01-31 Mohamed Elfeki , Rui Liu , Chad Voegele

ED2LM: Encoder-Decoder to Language Model for Faster Document Re-ranking Inference

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in text-to-image…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Bingqi Ma , Zhuofan Zong , Guanglu Song , Hongsheng Li , Yu Liu

Can we obtain significant success in RST discourse parsing by using Large Language Models?

Recently, decoder-only pre-trained large language models (LLMs), with several tens of billion parameters, have significantly impacted a wide range of natural language processing (NLP) tasks. While encoder-only or encoder-decoder pre-trained…

Computation and Language · Computer Science 2024-03-11 Aru Maekawa , Tsutomu Hirao , Hidetaka Kamigaito , Manabu Okumura

Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning

Large Language Models (LLMs) prompted to generate chain-of-thought (CoT) exhibit impressive reasoning capabilities. Recent attempts at prompt decomposition toward solving complex, multi-step reasoning problems depend on the ability of the…

Computation and Language · Computer Science 2024-02-28 Gurusha Juneja , Subhabrata Dutta , Soumen Chakrabarti , Sunny Manchanda , Tanmoy Chakraborty

RelayLLM: Efficient Reasoning via Collaborative Decoding

Large Language Models (LLMs) for complex reasoning is often hindered by high computational costs and latency, while resource-efficient Small Language Models (SLMs) typically lack the necessary reasoning capacity. Existing collaborative…

Computation and Language · Computer Science 2026-01-09 Chengsong Huang , Tong Zheng , Langlin Huang , Jinyuan Li , Haolin Liu , Jiaxin Huang

Language Ranker: A Lightweight Ranking framework for LLM Decoding

Conventional research on large language models (LLMs) has primarily focused on refining output distributions, while paying less attention to the decoding process that transforms these distributions into final responses. Recent advances,…

Computation and Language · Computer Science 2025-10-28 Chenheng Zhang , Tianqi Du , Jizhe Zhang , Mingqing Xiao , Yifei Wang , Yisen Wang , Zhouchen Lin

Causal Reasoning Favors Encoders: On The Limits of Decoder-Only Models

In context learning (ICL) underpins recent advances in large language models (LLMs), although its role and performance in causal reasoning remains unclear. Causal reasoning demands multihop composition and strict conjunctive control, and…

Computation and Language · Computer Science 2025-12-12 Amartya Roy , Elamparithy M , Kripabandhu Ghosh , Ponnurangam Kumaraguru , Adrian de Wynter

DiffuCoder: Understanding and Improving Masked Diffusion Models for Code Generation

Diffusion large language models (dLLMs) are compelling alternatives to autoregressive (AR) models because their denoising models operate over the entire sequence. The global planning and iterative refinement features of dLLMs are…

Computation and Language · Computer Science 2025-06-27 Shansan Gong , Ruixiang Zhang , Huangjie Zheng , Jiatao Gu , Navdeep Jaitly , Lingpeng Kong , Yizhe Zhang

Measuring the Redundancy of Decoder Layers in SpeechLLMs

Speech Large Language Models route speech encoder representations into an LLM decoder that typically accounts for over 90% of total parameters. We study how much of this decoder capacity is actually needed for speech tasks. Across two LLM…

Computation and Language · Computer Science 2026-03-06 Adel Moumen , Guangzhi Sun , Philip C Woodland

Adapting Decoder-Based Language Models for Diverse Encoder Downstream Tasks

Decoder-based transformers, while revolutionizing language modeling and scaling to immense sizes, have not completely overtaken encoder-heavy architectures in natural language processing. Specifically, encoder-only models remain dominant in…

Computation and Language · Computer Science 2025-03-05 Paul Suganthan , Fedor Moiseev , Le Yan , Junru Wu , Jianmo Ni , Jay Han , Imed Zitouni , Enrique Alfonseca , Xuanhui Wang , Zhe Dong