English
Related papers

Related papers: Adapting Decoder-Based Language Models for Diverse…

200 papers

While large language models are primarily used on natural language tasks, they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Most proposed approaches for such cross-modal…

Machine Learning · Computer Science 2026-03-09 Paloma García-de-Herreros , Philipp Slusallek , Dietrich Klakow , Vagrant Gautam

While decoder-only large language models (LLMs) have shown impressive results, encoder-decoder models are still widely adopted in real-world applications for their inference efficiency and richer encoder representation. In this paper, we…

Computation and Language · Computer Science 2025-04-09 Biao Zhang , Fedor Moiseev , Joshua Ainslie , Paul Suganthan , Min Ma , Surya Bhupatiraju , Fede Lebron , Orhan Firat , Armand Joulin , Zhe Dong

Recent studies have showcased remarkable capabilities of decoder-only models in many NLP tasks, including translation. Yet, the machine translation field has been largely dominated by encoder-decoder models based on the Transformer…

Computation and Language · Computer Science 2024-09-24 Gaëtan Caillaut , Raheel Qader , Mariam Nakhlé , Jingshu Liu , Jean-Gabriel Barthélemy

Encoder-decoder architecture is widely adopted for sequence-to-sequence modeling tasks. For machine translation, despite the evolution from long short-term memory networks to Transformer networks, plus the introduction and development of…

Computation and Language · Computer Science 2022-10-24 Yingbo Gao , Christian Herold , Zijian Yang , Hermann Ney

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-03 Jian Wu , Yashesh Gaur , Zhuo Chen , Long Zhou , Yimeng Zhu , Tianrui Wang , Jinyu Li , Shujie Liu , Bo Ren , Linquan Liu , Yu Wu

This paper explores the performance of encoder and decoder language models on multilingual Natural Language Understanding (NLU) tasks, with a broad focus on Germanic languages. Building upon the ScandEval benchmark, initially restricted to…

Computation and Language · Computer Science 2025-01-14 Dan Saattrup Nielsen , Kenneth Enevoldsen , Peter Schneider-Kamp

The field of neural machine translation (NMT) has changed with the advent of large language models (LLMs). Much of the recent emphasis in natural language processing (NLP) has been on modeling machine translation and many other problems…

Computation and Language · Computer Science 2025-06-03 Yingfeng Luo , Tong Zheng , Yongyu Mu , Bei Li , Qinghong Zhang , Yongqi Gao , Ziqiang Xu , Peinan Feng , Xiaoqian Liu , Tong Xiao , Jingbo Zhu

The dominance of large decoder-only language models has overshadowed encoder-decoder architectures, despite their fundamental efficiency advantages in sequence processing. For small language models (SLMs) - those with 1 billion parameters…

Computation and Language · Computer Science 2025-01-31 Mohamed Elfeki , Rui Liu , Chad Voegele

There have been significant efforts to interpret the encoder of Transformer-based encoder-decoder architectures for neural machine translation (NMT); meanwhile, the decoder remains largely unexamined despite its critical role. During…

Computation and Language · Computer Science 2020-10-07 Yilin Yang , Longyue Wang , Shuming Shi , Prasad Tadepalli , Stefan Lee , Zhaopeng Tu

State-of-the-art neural models typically encode document-query pairs using cross-attention for re-ranking. To this end, models generally utilize an encoder-only (like BERT) paradigm or an encoder-decoder (like T5) approach. These paradigms,…

Computation and Language · Computer Science 2022-04-26 Kai Hui , Honglei Zhuang , Tao Chen , Zhen Qin , Jing Lu , Dara Bahri , Ji Ma , Jai Prakash Gupta , Cicero Nogueira dos Santos , Yi Tay , Don Metzler

State-of-the-art multilingual machine translation relies on a universal encoder-decoder, which requires retraining the entire system to add new languages. In this paper, we propose an alternative approach that is based on language-specific…

Computation and Language · Computer Science 2020-04-15 Carlos Escolano , Marta R. Costa-jussà , José A. R. Fonollosa , Mikel Artetxe

Recent large language model (LLM) research has undergone an architectural shift from encoder-decoder modeling to nowadays the dominant decoder-only modeling. This rapid transition, however, comes without a rigorous comparative analysis…

Computation and Language · Computer Science 2025-10-31 Biao Zhang , Yong Cheng , Siamak Shakeri , Xinyi Wang , Min Ma , Orhan Firat

The sequence-to-sequence (seq2seq) task aims at generating the target sequence based on the given input source sequence. Traditionally, most of the seq2seq task is resolved by the Encoder-Decoder framework which requires an encoder to…

Computation and Language · Computer Science 2023-04-11 Zihao Fu , Wai Lam , Qian Yu , Anthony Man-Cho So , Shengding Hu , Zhiyuan Liu , Nigel Collier

Conventional research on large language models (LLMs) has primarily focused on refining output distributions, while paying less attention to the decoding process that transforms these distributions into final responses. Recent advances,…

Computation and Language · Computer Science 2025-10-28 Chenheng Zhang , Tianqi Du , Jizhe Zhang , Mingqing Xiao , Yifei Wang , Yisen Wang , Zhouchen Lin

Recently, language models (LMs), especially large language models (LLMs), have revolutionized the field of deep learning. Both encoder-decoder models and prompt-based techniques have shown immense potential for natural language processing…

Groundbreaking advancements in text-to-image generation have recently been achieved with the emergence of diffusion models. These models exhibit a remarkable ability to generate highly artistic and intricately detailed images based on…

Computer Vision and Pattern Recognition · Computer Science 2025-02-10 Ziyi Dong , Yao Xiao , Pengxu Wei , Liang Lin

Decoder-only language models, such as GPT and LLaMA, generally decode on the last layer. Motivated by human's hierarchical thinking capability, we propose that a hierarchical decoder architecture could be built with different layers…

Computation and Language · Computer Science 2025-09-30 Yihong Wang , Zhonglin Jiang , Ningyuan Xi , Yue Zhao , Qingqing Gu , Xiyuan Chen , Hao Wu , Sheng Xu , Hange Zhou , Yong Chen , Luo Ji

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

Large decoder-only language models (LLMs) are the state-of-the-art models on most of today's NLP tasks and benchmarks. Yet, the community is only slowly adopting these models for text embedding tasks, which require rich contextualized…

Computation and Language · Computer Science 2024-08-23 Parishad BehnamGhader , Vaibhav Adlakha , Marius Mosbach , Dzmitry Bahdanau , Nicolas Chapados , Siva Reddy

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam
‹ Prev 1 2 3 10 Next ›