English
Related papers

Related papers: Hyperdecoders: Instance-specific decoders for mult…

200 papers

We present HyperLoader, a simple approach that combines different parameter-efficient fine-tuning methods in a multi-task setting. To achieve this goal, our model uses a hypernetwork to generate the weights of these modules based on the…

Computation and Language · Computer Science 2024-08-27 Jesus-German Ortiz-Barajas , Helena Gomez-Adorno , Thamar Solorio

Decoding methods play an indispensable role in converting language models from next-token predictors into practical task solvers. Prior research on decoding methods, primarily focusing on task-specific models, may not extend to the current…

Computation and Language · Computer Science 2024-10-10 Chufan Shi , Haoran Yang , Deng Cai , Zhisong Zhang , Yifan Wang , Yujiu Yang , Wai Lam

Transformer-based NLP models are powerful but have high computational costs that limit deployment. Finetuned encoder-decoder models are popular in specialized domains and can outperform larger more generalized decoder-only models, such as…

Computation and Language · Computer Science 2024-11-19 Bo-Ru Lu , Nikita Haduong , Chien-Yu Lin , Hao Cheng , Noah A. Smith , Mari Ostendorf

Starting from NMT, encoder-decoder neu- ral networks have been used for many NLP problems. Graph-based models and transition-based models borrowing the en- coder components achieve state-of-the-art performance on dependency parsing and…

Computation and Language · Computer Science 2017-06-27 Jiangming Liu , Yue Zhang

Deep learning based decoding networks have shown significant improvement in decoding LDPC codes, but the neural decoders are limited by rate-matching operations such as puncturing or extending, thus needing to train multiple decoders with…

Signal Processing · Electrical Eng. & Systems 2023-10-11 Yukun Cheng , Wei Chen , Lun Li , Bo Ai

As Natural Language Processing (NLP) algorithms continually achieve new milestones, out-of-distribution generalization remains a significant challenge. This paper addresses the issue of multi-source adaptation for unfamiliar domains: We…

Computation and Language · Computer Science 2023-10-20 Tomer Volk , Eyal Ben-David , Ohad Amosy , Gal Chechik , Roi Reichart

Pre-trained Transformer models have achieved successes in a wide range of NLP tasks, but are inefficient when dealing with long input sequences. Existing studies try to overcome this challenge via segmenting the long sequence followed by…

Computation and Language · Computer Science 2022-03-16 Xiangyang Mou , Mo Yu , Bingsheng Yao , Lifu Huang

Deploying natural language processing (NLP) models on mobile platforms requires models that can adapt across diverse applications while remaining efficient in memory and computation. We investigate pre-finetuning strategies to enhance the…

Computation and Language · Computer Science 2025-10-10 Junyi Zhu , Savas Ozkan , Andrea Maracani , Sinan Mutlu , Cho Jung Min , Mete Ozay

Adapting large-scale pretrained models to various downstream tasks via fine-tuning is a standard method in machine learning. Recently, parameter-efficient fine-tuning methods show promise in adapting a pretrained model to different tasks…

Computer Vision and Pattern Recognition · Computer Science 2022-10-10 Yen-Cheng Liu , Chih-Yao Ma , Junjiao Tian , Zijian He , Zsolt Kira

Constrained sequence codes have been widely used in modern communication and data storage systems. Sequences encoded with constrained sequence codes satisfy constraints imposed by the physical channel, hence enabling efficient and reliable…

Information Theory · Computer Science 2018-09-07 Congzhe Cao , Duanshun Li , Ivan Fair

Intermediate-task transfer can benefit a wide range of NLP tasks with properly selected source datasets. However, it is computationally infeasible to experiment with all intermediate transfer combinations, making choosing a useful source…

Computation and Language · Computer Science 2022-10-24 Wangchunshu Zhou , Canwen Xu , Julian McAuley

Recurrent Neural Networks (RNNs) with attention mechanisms have obtained state-of-the-art results for many sequence processing tasks. Most of these models use a simple form of encoder with attention that looks over the entire sequence and…

The workflow of pretraining and fine-tuning has emerged as a popular paradigm for solving various NLP and V&L (Vision-and-Language) downstream tasks. With the capacity of pretrained models growing rapidly, how to perform parameter-efficient…

Computation and Language · Computer Science 2022-03-09 Zhengkun Zhang , Wenya Guo , Xiaojun Meng , Yasheng Wang , Yadao Wang , Xin Jiang , Qun Liu , Zhenglu Yang

Prompt-Tuning is a new paradigm for finetuning pre-trained language models in a parameter-efficient way. Here, we explore the use of HyperNetworks to generate hyper-prompts: we propose HyperPrompt, a novel architecture for prompt-based…

Computation and Language · Computer Science 2022-06-16 Yun He , Huaixiu Steven Zheng , Yi Tay , Jai Gupta , Yu Du , Vamsi Aribandi , Zhe Zhao , YaGuang Li , Zhao Chen , Donald Metzler , Heng-Tze Cheng , Ed H. Chi

Labeled sequence transduction is a task of transforming one sequence into another sequence that satisfies desiderata specified by a set of labels. In this paper we propose multi-space variational encoder-decoders, a new model for labeled…

Computation and Language · Computer Science 2019-10-08 Chunting Zhou , Graham Neubig

Generally, the decoder-only large language models (LLMs) are adapted to context-aware neural machine translation (NMT) in a concatenating way, where LLMs take the concatenation of the source sentence (i.e., intra-sentence context) and the…

Computation and Language · Computer Science 2024-09-24 Xinglin Lyu , Junhui Li , Yanqing Zhao , Min Zhang , Daimeng Wei , Shimin Tao , Hao Yang , Min Zhang

Neural Networks have been proved to work as decoders in telecommunications, so the ways of making it efficient will be investigated in this thesis. The different parameters to maximize the Neural Network Decoder's efficiency will be…

Information Theory · Computer Science 2022-04-27 Joshua Tshifhiwa Maumela

Functional data clustering is concerned with grouping functions that share similar structure, yet most existing methods implicitly operate on sampled grids, causing cluster assignments to depend on resolution, sampling density, or…

Machine Learning · Computer Science 2026-02-27 Anirudh Thatipelli , Ali Siahkoohi

Neural Encoders are frequently used in the NLP domain to perform dense retrieval tasks, for instance, to generate the candidate documents for a given query in question-answering tasks. However, sparse annotation and label noise in the…

Machine Learning · Computer Science 2025-12-16 Arnab Sharma

Decoding from large language models (LLMs) typically relies on fixed sampling hyperparameters (e.g., temperature, top-p), despite substantial variation in task difficulty and uncertainty across prompts and individual decoding steps. We…

Machine Learning · Computer Science 2026-03-17 Chloe H. Su , Zhe Ye , Samuel Tenka , Aidan Yang , Soonho Kong , Udaya Ghai
‹ Prev 1 2 3 10 Next ›