Related papers: Transformers are Efficient Compilers, Provably

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities. Existing empirical studies have revealed a strong connection between these LLMs' impressive emergence abilities and their…

Machine Learning · Computer Science 2025-08-14 Dake Bu , Wei Huang , Andi Han , Atsushi Nitanda , Taiji Suzuki , Qingfu Zhang , Hau-San Wong

Moving Beyond Next-Token Prediction: Transformers are Context-Sensitive Language Generators

Large Language Models (LLMs), powered by Transformers, have demonstrated human-like intelligence capabilities, yet their underlying mechanisms remain poorly understood. This paper presents a novel framework for interpreting LLMs as…

Computation and Language · Computer Science 2025-04-16 Phill Kyu Rhee

Faith and Fate: Limits of Transformers on Compositionality

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This…

Computation and Language · Computer Science 2023-11-01 Nouha Dziri , Ximing Lu , Melanie Sclar , Xiang Lorraine Li , Liwei Jiang , Bill Yuchen Lin , Peter West , Chandra Bhagavatula , Ronan Le Bras , Jena D. Hwang , Soumya Sanyal , Sean Welleck , Xiang Ren , Allyson Ettinger , Zaid Harchaoui , Yejin Choi

Transformer-Aided Semantic Communications

The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Matin Mortaheb , Erciyes Karakaya , Mohammad A. Amir Khojastepour , Sennur Ulukus

LLM-Aided Compilation for Tensor Accelerators

Hardware accelerators, in particular accelerators for tensor processing, have many potential application domains. However, they currently lack the software infrastructure to support the majority of domains outside of deep learning.…

Hardware Architecture · Computer Science 2024-08-08 Charles Hong , Sahil Bhatia , Altan Haan , Shengjun Kris Dong , Dima Nikiforov , Alvin Cheung , Yakun Sophia Shao

Parameter-Efficient Transformer Embeddings

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

On the Ability and Limitations of Transformers to Recognize Formal Languages

Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on…

Computation and Language · Computer Science 2020-10-09 Satwik Bhattamishra , Kabir Ahuja , Navin Goyal

Understanding Transformers via N-gram Statistics

Transformer based large-language models (LLMs) display extreme proficiency with language yet a precise understanding of how they work remains elusive. One way of demystifying transformer predictions would be to describe how they depend on…

Computation and Language · Computer Science 2024-11-06 Timothy Nguyen

To Transformers and Beyond: Large Language Models for the Genome

In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based…

Genomics · Quantitative Biology 2023-11-15 Micaela E. Consens , Cameron Dufault , Michael Wainberg , Duncan Forster , Mehran Karimzadeh , Hani Goodarzi , Fabian J. Theis , Alan Moses , Bo Wang

REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving

While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovation. Compiler optimizations have long driven…

Machine Learning · Computer Science 2026-02-05 Annabelle Sujun Tang , Christopher Priebe , Rohan Mahapatra , Lianhui Qin , Hadi Esmaeilzadeh

On Optimal Transformer Depth for Low-Resource Language Translation

Transformers have shown great promise as an approach to Neural Machine Translation (NMT) for low-resource languages. However, at the same time, transformer models remain difficult to optimize and require careful tuning of hyper-parameters…

Computation and Language · Computer Science 2020-04-16 Elan van Biljon , Arnu Pretorius , Julia Kreutzer

NLD-LLM: A systematic framework for evaluating small language transformer models on natural language description

Natural Language Description (NLD) is a Natural Language Processing (NLP) task that requires models to generate structured and meaningful outputs from natural language inputs. In this work, we propose NLD-LLM, a systematic NLP framework to…

Computation and Language · Computer Science 2025-10-08 Hamed Jelodar , Mohammad Meymani , Parisa Hamedi , Tochukwu Emmanuel Nwankwo , Samita Bai , Roozbeh Razavi-Far , Ali A. Ghorbani

Transformers are Inherently Succinct

We study succinctness as a measure of the expressive power of transformers. Succinctness -- how compactly a formalism can describe a language relative to other formalisms -- is a classical notion in logic and automata theory. We prove that…

Formal Languages and Automata Theory · Computer Science 2026-05-18 Pascal Bergsträßer , Ryan Cotterell , Anthony W. Lin

Dissecting Multiplication in Transformers: Insights into LLMs

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…

Computation and Language · Computer Science 2024-07-23 Luyu Qiu , Jianing Li , Chi Su , Chen Jason Zhang , Lei Chen

Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion…

Computation and Language · Computer Science 2026-04-16 Corentin Kervadec , Iuliia Lysova , Marco Baroni , Gemma Boleda

Large Language Models for Compiler Optimization

We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and…

Programming Languages · Computer Science 2023-09-14 Chris Cummins , Volker Seeker , Dejan Grubisic , Mostafa Elhoushi , Youwei Liang , Baptiste Roziere , Jonas Gehring , Fabian Gloeckle , Kim Hazelwood , Gabriel Synnaeve , Hugh Leather

How Powerful are Decoder-Only Transformer Neural Models?

In this article we prove that the general transformer neural model undergirding modern large language models (LLMs) is Turing complete under reasonable assumptions. This is the first work to directly address the Turing completeness of the…

Computation and Language · Computer Science 2024-10-11 Jesse Roberts

Ask Language Model to Clean Your Noisy Translation Data

Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy…

Computation and Language · Computer Science 2023-10-25 Quinten Bolding , Baohao Liao , Brandon James Denis , Jun Luo , Christof Monz

Emergence of a High-Dimensional Abstraction Phase in Language Transformers

A language model (LM) is a mapping from a linguistic context to an output token. However, much remains to be known about this mapping, including how its geometric properties relate to its function. We take a high-level geometric approach to…

Computation and Language · Computer Science 2025-05-01 Emily Cheng , Diego Doimo , Corentin Kervadec , Iuri Macocco , Jade Yu , Alessandro Laio , Marco Baroni

Transformer-Based Language Models for Software Vulnerability Detection

The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the…

Cryptography and Security · Computer Science 2022-09-07 Chandra Thapa , Seung Ick Jang , Muhammad Ejaz Ahmed , Seyit Camtepe , Josef Pieprzyk , Surya Nepal