English
Related papers

Related papers: Transformers are Efficient Compilers, Provably

200 papers

Transformer-based large language models (LLMs) have displayed remarkable creative prowess and emergence capabilities. Existing empirical studies have revealed a strong connection between these LLMs' impressive emergence abilities and their…

Machine Learning · Computer Science 2025-08-14 Dake Bu , Wei Huang , Andi Han , Atsushi Nitanda , Taiji Suzuki , Qingfu Zhang , Hau-San Wong

Large Language Models (LLMs), powered by Transformers, have demonstrated human-like intelligence capabilities, yet their underlying mechanisms remain poorly understood. This paper presents a novel framework for interpreting LLMs as…

Computation and Language · Computer Science 2025-04-16 Phill Kyu Rhee

Transformer large language models (LLMs) have sparked admiration for their exceptional performance on tasks that demand intricate multi-step reasoning. Yet, these models simultaneously show failures on surprisingly trivial problems. This…

The transformer structure employed in large language models (LLMs), as a specialized category of deep neural networks (DNNs) featuring attention mechanisms, stands out for their ability to identify and highlight the most relevant aspects of…

Computer Vision and Pattern Recognition · Computer Science 2024-05-03 Matin Mortaheb , Erciyes Karakaya , Mohammad A. Amir Khojastepour , Sennur Ulukus

Hardware accelerators, in particular accelerators for tensor processing, have many potential application domains. However, they currently lack the software infrastructure to support the majority of domains outside of deep learning.…

Hardware Architecture · Computer Science 2024-08-08 Charles Hong , Sahil Bhatia , Altan Haan , Shengjun Kris Dong , Dima Nikiforov , Alvin Cheung , Yakun Sophia Shao

Embedding layers in transformer-based NLP models typically account for the largest share of model parameters, scaling with vocabulary size but not yielding performance gains proportional to scale. We propose an alternative approach in which…

Computation and Language · Computer Science 2025-05-06 Henry Ndubuaku , Mouad Talhi

Transformers have supplanted recurrent models in a large number of NLP tasks. However, the differences in their abilities to model different syntactic properties remain largely unknown. Past works suggest that LSTMs generalize very well on…

Computation and Language · Computer Science 2020-10-09 Satwik Bhattamishra , Kabir Ahuja , Navin Goyal

Transformer based large-language models (LLMs) display extreme proficiency with language yet a precise understanding of how they work remains elusive. One way of demystifying transformer predictions would be to describe how they depend on…

Computation and Language · Computer Science 2024-11-06 Timothy Nguyen

In the rapidly evolving landscape of genomics, deep learning has emerged as a useful tool for tackling complex computational challenges. This review focuses on the transformative role of Large Language Models (LLMs), which are mostly based…

While model serving has unlocked unprecedented capabilities, the high cost of serving large-scale models continues to be a significant barrier to widespread accessibility and rapid innovation. Compiler optimizations have long driven…

Machine Learning · Computer Science 2026-02-05 Annabelle Sujun Tang , Christopher Priebe , Rohan Mahapatra , Lianhui Qin , Hadi Esmaeilzadeh

Transformers have shown great promise as an approach to Neural Machine Translation (NMT) for low-resource languages. However, at the same time, transformer models remain difficult to optimize and require careful tuning of hyper-parameters…

Computation and Language · Computer Science 2020-04-16 Elan van Biljon , Arnu Pretorius , Julia Kreutzer

Natural Language Description (NLD) is a Natural Language Processing (NLP) task that requires models to generate structured and meaningful outputs from natural language inputs. In this work, we propose NLD-LLM, a systematic NLP framework to…

Computation and Language · Computer Science 2025-10-08 Hamed Jelodar , Mohammad Meymani , Parisa Hamedi , Tochukwu Emmanuel Nwankwo , Samita Bai , Roozbeh Razavi-Far , Ali A. Ghorbani

We study succinctness as a measure of the expressive power of transformers. Succinctness -- how compactly a formalism can describe a language relative to other formalisms -- is a classical notion in logic and automata theory. We prove that…

Formal Languages and Automata Theory · Computer Science 2026-05-18 Pascal Bergsträßer , Ryan Cotterell , Anthony W. Lin

Transformer-based large language models have achieved remarkable performance across various natural language processing tasks. However, they often struggle with seemingly easy tasks like arithmetic despite their vast capabilities. This…

Computation and Language · Computer Science 2024-07-23 Luyu Qiu , Jianing Li , Chi Su , Chen Jason Zhang , Lei Chen

Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is possible to prune a significant portion…

Computation and Language · Computer Science 2026-04-16 Corentin Kervadec , Iuliia Lysova , Marco Baroni , Gemma Boleda

We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and…

In this article we prove that the general transformer neural model undergirding modern large language models (LLMs) is Turing complete under reasonable assumptions. This is the first work to directly address the Turing completeness of the…

Computation and Language · Computer Science 2024-10-11 Jesse Roberts

Transformer models have demonstrated remarkable performance in neural machine translation (NMT). However, their vulnerability to noisy input poses a significant challenge in practical implementation, where generating clean output from noisy…

Computation and Language · Computer Science 2023-10-25 Quinten Bolding , Baohao Liao , Brandon James Denis , Jun Luo , Christof Monz

A language model (LM) is a mapping from a linguistic context to an output token. However, much remains to be known about this mapping, including how its geometric properties relate to its function. We take a high-level geometric approach to…

Computation and Language · Computer Science 2025-05-01 Emily Cheng , Diego Doimo , Corentin Kervadec , Iuri Macocco , Jade Yu , Alessandro Laio , Marco Baroni

The large transformer-based language models demonstrate excellent performance in natural language processing. By considering the transferability of the knowledge gained by these models in one domain to other related domains, and the…

Cryptography and Security · Computer Science 2022-09-07 Chandra Thapa , Seung Ick Jang , Muhammad Ejaz Ahmed , Seyit Camtepe , Josef Pieprzyk , Surya Nepal
‹ Prev 1 2 3 10 Next ›