Related papers: Breaking the Activation Function Bottleneck throug…

Adaptable Adapters

State-of-the-art pretrained NLP models contain a hundred million to trillion parameters. Adapters provide a parameter-efficient alternative for the full finetuning in which we can only finetune lightweight neural network layers on top of…

Computation and Language · Computer Science 2022-05-04 Nafise Sadat Moosavi , Quentin Delfosse , Kristian Kersting , Iryna Gurevych

Regularizing and Optimizing LSTM Language Models

Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks, including machine translation, language modeling, and question answering. In this…

Computation and Language · Computer Science 2017-08-09 Stephen Merity , Nitish Shirish Keskar , Richard Socher

On the State of the Art of Evaluation in Neural Language Models

Ongoing innovations in recurrent neural network architectures have provided a steady influx of apparently state-of-the-art results on language modelling benchmarks. However, these have been evaluated using differing code bases and limited…

Computation and Language · Computer Science 2017-11-21 Gábor Melis , Chris Dyer , Phil Blunsom

Investigating Low-Rank Training in Transformer Language Models: Efficiency and Scaling Analysis

State-of-the-art LLMs often rely on scale with high computational costs, which has sparked a research agenda to reduce parameter counts and costs without significantly impacting performance. Our study focuses on Transformer-based LLMs,…

Computation and Language · Computer Science 2024-07-25 Xiuying Wei , Skander Moalla , Razvan Pascanu , Caglar Gulcehre

Growing Tiny Networks: Spotting Expressivity Bottlenecks and Fixing Them Optimally

Machine learning tasks are generally formulated as optimization problems, where one searches for an optimal function within a certain functional space. In practice, parameterized functional spaces are considered, in order to be able to…

Artificial Intelligence · Computer Science 2024-12-13 Manon Verbockhaven , Sylvain Chevallier , Guillaume Charpiat , Théo Rudkiewicz

Neural Transition-based Syntactic Linearization

The task of linearization is to find a grammatical order given a set of words. Traditional models use statistical methods. Syntactic linearization systems, which generate a sentence along with its syntactic tree, have shown state-of-the-art…

Computation and Language · Computer Science 2018-10-24 Linfeng Song , Yue Zhang , Daniel Gildea

A Simple LSTM model for Transition-based Dependency Parsing

We present a simple LSTM-based transition-based dependency parser. Our model is composed of a single LSTM hidden layer replacing the hidden layer in the usual feed-forward network architecture. We also propose a new initialization method…

Computation and Language · Computer Science 2017-09-12 Mohab Elkaref , Bernd Bohnet

Parameter Efficient Transfer Learning for Various Speech Processing Tasks

Fine-tuning of self-supervised models is a powerful transfer learning method in a variety of fields, including speech processing, since it can utilize generic feature representations obtained from large amounts of unlabeled data.…

Multimedia · Computer Science 2022-12-07 Shinta Otake , Rei Kawakami , Nakamasa Inoue

ByteFlow: Language Modeling through Adaptive Byte Compression without a Tokenizer

Modern language models still rely on fixed, pre-defined subword tokenizations. Once a tokenizer is trained, the LM can only operate at this fixed level of granularity, which often leads to brittle and counterintuitive behaviors even in…

Computation and Language · Computer Science 2026-03-05 Chunyuan Deng , Sanket Lokegaonkar , Colin Lockard , Besnik Fetahu , Nasser Zalmout , Xian Li

Conditionally Adaptive Multi-Task Learning: Improving Transfer Learning in NLP Using Fewer Parameters & Less Data

Multi-Task Learning (MTL) networks have emerged as a promising method for transferring learned knowledge across different tasks. However, MTL must deal with challenges such as: overfitting to low resource tasks, catastrophic forgetting, and…

Machine Learning · Computer Science 2022-04-22 Jonathan Pilault , Amine Elhattami , Christopher Pal

Parameter-Efficient Transformer Embeddings

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

Learning Activation Functions: A new paradigm for understanding Neural Networks

The scope of research in the domain of activation functions remains limited and centered around improving the ease of optimization or generalization quality of neural networks (NNs). However, to develop a deeper understanding of deep…

Machine Learning · Computer Science 2020-12-10 Mohit Goyal , Rajan Goyal , Brejesh Lall

Adaptive Large Language Models By Layerwise Attention Shortcuts

Transformer architectures are the backbone of the modern AI revolution. However, they are based on simply stacking the same blocks in dozens of layers and processing information sequentially from one block to another. In this paper, we…

Computation and Language · Computer Science 2024-12-24 Prateek Verma , Mert Pilanci

Incremental Parsing with Minimal Features Using Bi-Directional LSTM

Recently, neural network approaches for parsing have largely automated the combination of individual features, but still rely on (often a larger number of) atomic features created from human linguistic intuition, and potentially omitting…

Computation and Language · Computer Science 2016-06-22 James Cross , Liang Huang

Plug-and-Play Spiking Operators: Breaking the Nonlinearity Bottleneck in Spiking Transformers

ANN-to-SNN conversion offers a practical, training-free route to spiking large language models. However, current pipelines primarily focus on spike-driven realizations for Transformer linear-algebra operations, while providing limited…

Machine Learning · Computer Science 2026-05-21 Xinzhe Yuan , Xiang Peng , Bin Gu , Huan Xiong

Structure-Learnable Adapter Fine-Tuning for Parameter-Efficient Large Language Models

This paper addresses the issues of parameter redundancy, rigid structure, and limited task adaptability in the fine-tuning of large language models. It proposes an adapter-based fine-tuning method built on a structure-learnable mechanism.…

Computation and Language · Computer Science 2025-09-04 Ming Gong , Yingnan Deng , Nia Qi , Yujun Zou , Zhihao Xue , Yun Zi

Learning Neural Networks with Adaptive Regularization

Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing…

Machine Learning · Computer Science 2019-10-24 Han Zhao , Yao-Hung Hubert Tsai , Ruslan Salakhutdinov , Geoffrey J. Gordon

AdapterBias: Parameter-efficient Token-dependent Representation Shift for Adapters in NLP Tasks

Transformer-based pre-trained models with millions of parameters require large storage. Recent approaches tackle this shortcoming by training adapters, but these approaches still require a relatively large number of parameters. In this…

Computation and Language · Computer Science 2023-01-31 Chin-Lun Fu , Zih-Ching Chen , Yun-Ru Lee , Hung-yi Lee

Learning Hierarchical Structures with Differentiable Nondeterministic Stacks

Learning hierarchical structures in sequential data -- from simple algorithmic patterns to natural language -- in a reliable, generalizable way remains a challenging problem for neural language models. Past work has shown that recurrent…

Computation and Language · Computer Science 2022-12-01 Brian DuSell , David Chiang

Learning Features with Parameter-Free Layers

Trainable layers such as convolutional building blocks are the standard network design choices by learning parameters to capture the global context through successive spatial operations. When designing an efficient network, trainable layers…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Dongyoon Han , YoungJoon Yoo , Beomyoung Kim , Byeongho Heo