English
Related papers

Related papers: Language Models with Transformers

200 papers

Language models (LMs) are being scaled and becoming powerful. Improving their efficiency is one of the core research topics in neural information processing systems. Tay et al. (2022) provided a comprehensive overview of efficient…

Machine Learning · Computer Science 2023-06-06 Meng Jiang , Hy Dang , Lingbo Tong

The Transformer architecture is ubiquitously used as the building block of large-scale autoregressive language models. However, finding architectures with the optimal trade-off between task performance (perplexity) and hardware constraints…

Transformer-based language models have recently been at the forefront of active research in text generation. However, these models' advances come at the price of prohibitive training costs, with parameter counts in the billions and compute…

Computation and Language · Computer Science 2025-02-04 Gabriel Lindenmaier , Sean Papay , Sebastian Padó

Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on…

Computation and Language · Computer Science 2020-10-09 Satwik Bhattamishra , Kabir Ahuja , Navin Goyal

Natural Language Processing (NLP) has witnessed a transformative leap with the advent of transformer-based architectures, which have significantly enhanced the ability of machines to understand and generate human-like text. This paper…

Computation and Language · Computer Science 2025-03-27 Tianhao Wu , Yu Wang , Ngoc Quach

Language Models are the underpin of all modern Natural Language Processing (NLP) tasks. The introduction of the Transformers architecture has contributed significantly into making Language Modeling very effective across many NLP task,…

Computation and Language · Computer Science 2021-11-05 Nikolaos Stylianou , Ioannis Vlahavas

Bidirectional Encoder Representations from Transformers (BERT) has recently achieved state-of-the-art performance on a broad range of NLP tasks including sentence classification, machine translation, and question answering. The BERT model…

Computation and Language · Computer Science 2020-03-17 Zhiheng Huang , Peng Xu , Davis Liang , Ajay Mishra , Bing Xiang

The Transformer architecture and transfer learning have marked a quantum leap in natural language processing, improving the state of the art across a range of text-based tasks. This paper examines how these advancements can be applied to…

Software Engineering · Computer Science 2022-08-29 Pasquale Salza , Christoph Schwizer , Jian Gu , Harald C. Gall

Large Transformer-based language models such as BERT have led to broad performance improvements on many NLP tasks. Domain-specific variants of these models have demonstrated excellent performance on a variety of specialised tasks. In legal…

Computation and Language · Computer Science 2021-09-16 Benjamin Clavié , Akshita Gheewala , Paul Briton , Marc Alphonsus , Rym Laabiyad , Francesco Piccoli

We explore neural language modeling for speech recognition where the context spans multiple sentences. Rather than encode history beyond the current sentence using a cache of words or document-level features, we focus our study on the…

Computation and Language · Computer Science 2019-11-13 Sarangarajan Parthasarathy , William Gale , Xie Chen , George Polovets , Shuangyu Chang

With the recent developments in the field of Natural Language Processing, there has been a rise in the use of different architectures for Neural Machine Translation. Transformer architectures are used to achieve state-of-the-art accuracy,…

Computation and Language · Computer Science 2021-11-30 Aditya Mandke , Onkar Litake , Dipali Kadam

Long Short-Term Memory (LSTM) and Transformers are two popular neural architectures used for natural language processing tasks. Theoretical results show that both are Turing-complete and can represent any context-free language (CFL).In…

Computation and Language · Computer Science 2022-03-24 Hui Shi , Sicun Gao , Yuandong Tian , Xinyun Chen , Jishen Zhao

Modeling the parser state is key to good performance in transition-based parsing. Recurrent Neural Networks considerably improved the performance of transition-based systems by modelling the global state, e.g. stack-LSTM parsers, or local…

Computation and Language · Computer Science 2020-10-22 Ramon Fernandez Astudillo , Miguel Ballesteros , Tahira Naseem , Austin Blodgett , Radu Florian

In recent years, Natural Language Processing (NLP) models have achieved phenomenal success in linguistic and semantic tasks like text classification, machine translation, cognitive dialogue systems, information retrieval via Natural…

Computation and Language · Computer Science 2021-05-18 Sushant Singh , Ausif Mahmood

We introduce Transformer Grammars (TGs), a novel class of Transformer language models that combine (i) the expressive power, scalability, and strong performance of Transformers and (ii) recursive syntactic compositions, which here are…

Computation and Language · Computer Science 2022-12-07 Laurent Sartran , Samuel Barrett , Adhiguna Kuncoro , Miloš Stanojević , Phil Blunsom , Chris Dyer

In this paper, we describe our submission to the BabyLM Challenge 2023 shared task on data-efficient language model (LM) pretraining (Warstadt et al., 2023). We train transformer-based masked language models that incorporate unsupervised…

Computation and Language · Computer Science 2024-03-12 Omar Momen , David Arps , Laura Kallmeyer

Although masked language models are highly performant and widely adopted by NLP practitioners, they can not be easily used for autoregressive language modelling (next word prediction and sequence probability estimation). We present an…

Computation and Language · Computer Science 2022-08-08 Vilém Zouhar , Marius Mosbach , Dietrich Klakow

We explore deep autoregressive Transformer models in language modeling for speech recognition. We focus on two aspects. First, we revisit Transformer model configurations specifically for language modeling. We show that well configured…

Computation and Language · Computer Science 2019-09-25 Kazuki Irie , Albert Zeyer , Ralf Schlüter , Hermann Ney

Transformers have recently taken the center stage in language modeling after LSTM's were considered the dominant model architecture for a long time. In this project, we investigate the performance of the Transformer architectures-BERT and…

Computation and Language · Computer Science 2020-03-30 Abhilash Jain , Aku Ruohe , Stig-Arne Grönroos , Mikko Kurimo

Pretrained Transformer encoders are the dominant approach to sequence labeling. While some alternative architectures-such as xLSTMs, structured state-space models, diffusion models, and adversarial learning-have shown promise in language…

Computation and Language · Computer Science 2026-03-19 Ana Ezquerro , Carlos Gómez-Rodríguez , David Vilares
‹ Prev 1 2 3 10 Next ›