Related papers: Language Models with Transformers

A Quantitative Review on Language Model Efficiency Research

Language models (LMs) are being scaled and becoming powerful. Improving their efficiency is one of the core research topics in neural information processing systems. Tay et al. (2022) provided a comprehensive overview of efficient…

Machine Learning · Computer Science 2023-06-06 Meng Jiang , Hy Dang , Lingbo Tong

LiteTransformerSearch: Training-free Neural Architecture Search for Efficient Language Models

The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints…

Machine Learning · Computer Science 2022-10-19 Mojan Javaheripi , Gustavo H. de Rosa , Subhabrata Mukherjee , Shital Shah , Tomasz L. Religa , Caio C. T. Mendes , Sebastien Bubeck , Farinaz Koushanfar , Debadeepta Dey

Efficient Language Modeling for Low-Resource Settings with Hybrid RNN-Transformer Architectures

Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute…

Computation and Language · Computer Science 2025-02-04 Gabriel Lindenmaier , Sean Papay , Sebastian Padó

On the Ability and Limitations of Transformers to Recognize Formal Languages

Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on…

Computation and Language · Computer Science 2020-10-09 Satwik Bhattamishra , Kabir Ahuja , Navin Goyal

Advancements in Natural Language Processing: Exploring Transformer-Based Architectures for Text Understanding

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

CoreLM: Coreference-aware Language Model Fine-Tuning

Language Models are the underpin of all modern Natural Language Processing (NLP) tasks. The introduction of the Transformers architecture has contributed significantly into making Language Modeling very effective across many NLP task,…

Computation and Language · Computer Science 2021-11-05 Nikolaos Stylianou , Ioannis Vlahavas

TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding

Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model…

Computation and Language · Computer Science 2020-03-17 Zhiheng Huang , Peng Xu , Davis Liang , Ajay Mishra , Bing Xiang

On the Effectiveness of Transfer Learning for Code Search

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to…

Software Engineering · Computer Science 2022-08-29 Pasquale Salza , Christoph Schwizer , Jian Gu , Harald C. Gall

LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training

Large Transformer-based language models such as BERT have led to broad performance improvements on many NLP tasks. Domain-specific variants of these models have demonstrated excellent performance on a variety of specialised tasks. In legal…

Computation and Language · Computer Science 2021-09-16 Benjamin Clavié , Akshita Gheewala , Paul Briton , Marc Alphonsus , Rym Laabiyad , Francesco Piccoli

Long-span language modeling for speech recognition

We explore neural language modeling for speech recognition where the context spans multiple sentences. Rather than encode history beyond the current sentence using a cache of words or document-level features, we focus our study on the…

Computation and Language · Computer Science 2019-11-13 Sarangarajan Parthasarathy , William Gale , Xie Chen , George Polovets , Shuangyu Chang

Analyzing Architectures for Neural Machine Translation Using Low Computational Resources

With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy,…

Computation and Language · Computer Science 2021-11-30 Aditya Mandke , Onkar Litake , Dipali Kadam

Learning Bounded Context-Free-Grammar via LSTM and the Transformer:Difference and Explanations

Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks. Theoretical results show that both are Turing-complete and can represent any context-free language (CFL).In…

Computation and Language · Computer Science 2022-03-24 Hui Shi , Sicun Gao , Yuandong Tian , Xinyun Chen , Jishen Zhao

Transition-based Parsing with Stack-Transformers

Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local…

Computation and Language · Computer Science 2020-10-22 Ramon Fernandez Astudillo , Miguel Ballesteros , Tahira Naseem , Austin Blodgett , Radu Florian

The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures

In recent years, Natural Language Processing (NLP) models have achieved phenomenal success in linguistic and semantic tasks like text classification, machine translation, cognitive dialogue systems, information retrieval via Natural…

Computation and Language · Computer Science 2021-05-18 Sushant Singh , Ausif Mahmood

Transformer Grammars: Augmenting Transformer Language Models with Syntactic Inductive Biases at Scale

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are…

Computation and Language · Computer Science 2022-12-07 Laurent Sartran , Samuel Barrett , Adhiguna Kuncoro , Miloš Stanojević , Phil Blunsom , Chris Dyer

Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

In this paper, we describe our submission to the BabyLM Challenge 2023 shared task on data-efficient language model (LM) pretraining (Warstadt et al., 2023). We train transformer-based masked language models that incorporate unsupervised…

Computation and Language · Computer Science 2024-03-12 Omar Momen , David Arps , Laura Kallmeyer

Fusing Sentence Embeddings Into LSTM-based Autoregressive Language Models

Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an…

Computation and Language · Computer Science 2022-08-08 Vilém Zouhar , Marius Mosbach , Dietrich Klakow

Language Modeling with Deep Transformers

We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured…

Computation and Language · Computer Science 2019-09-25 Kazuki Irie , Albert Zeyer , Ralf Schlüter , Hermann Ney

Finnish Language Modeling with Deep Transformer Models

Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and…

Computation and Language · Computer Science 2020-03-30 Abhilash Jain , Aku Ruohe , Stig-Arne Grönroos , Mikko Kurimo

Bringing Emerging Architectures to Sequence Labeling in NLP

Pretrained Transformer encoders are the dominant approach to sequence labeling. While some alternative architectures-such as xLSTMs, structured state-space models, diffusion models, and adversarial learning-have shown promise in language…

Computation and Language · Computer Science 2026-03-19 Ana Ezquerro , Carlos Gómez-Rodríguez , David Vilares