Related papers: Transducing Language Models

Language Models over Canonical Byte-Pair Encodings

Modern language models represent probability distributions over character strings as distributions over (shorter) token strings derived via a deterministic tokenizer, such as byte-pair encoding. While this approach is highly effective at…

Computation and Language · Computer Science 2025-06-10 Tim Vieira , Tianyu Liu , Clemente Pasti , Yahya Emara , Brian DuSell , Benjamin LeBrun , Mario Giulianelli , Juan Luis Gastaldi , Timothy J. O'Donnell , Ryan Cotterell

What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages

What can large language models learn? By definition, language models (LM) are distributions over strings. Therefore, an intuitive way of addressing the above question is to formalize it as a matter of learnability of classes of…

Computation and Language · Computer Science 2025-01-14 Nadav Borenstein , Anej Svete , Robin Chan , Josef Valvoda , Franz Nowak , Isabelle Augenstein , Eleanor Chodroff , Ryan Cotterell

Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation

Syntactic structures used to play a vital role in natural language processing (NLP), but since the deep learning revolution, NLP has been gradually dominated by neural models that do not consider syntactic structures in their design. One…

Computation and Language · Computer Science 2023-11-28 Haoyi Wu , Kewei Tu

From Language Models over Tokens to Language Models over Characters

Modern language models are internally -- and mathematically -- distributions over $\it{token}$ strings rather than $\it{character}$ strings, posing numerous challenges for programmers building user applications on top of them. For example,…

Computation and Language · Computer Science 2025-06-11 Tim Vieira , Ben LeBrun , Mario Giulianelli , Juan Luis Gastaldi , Brian DuSell , John Terilla , Timothy J. O'Donnell , Ryan Cotterell

Transduce: learning transduction grammars for string transformation

The synthesis of string transformation programs from input-output examples utilizes various techniques, all based on an inductive bias that comprises a restricted set of basic operators to be combined. A new algorithm, Transduce, is…

Machine Learning · Computer Science 2024-01-19 Francis Frydman , Philippe Mangion

What You Must Remember When Transforming Datawords

Streaming Data String Transducers (SDSTs) were introduced to model a class of imperative and a class of functional programs, manipulating lists of data items. These can be used to write commonly used routines such as insert, delete and…

Formal Languages and Automata Theory · Computer Science 2020-12-15 M. Praveen

Express Your Doubts -- Probabilistic World Modeling Should not be Based on Token logprobs

Language modeling has shifted in recent years from a distribution over strings to prediction models with textual inputs and outputs for general-purpose tasks. This position paper highlights the often overlooked implications of this shift…

Computation and Language · Computer Science 2026-05-13 Eitan Wagner , Omri Abend

Deterministic or probabilistic? The psychology of LLMs as random number generators

Large Language Models (LLMs) have transformed text generation through inherently probabilistic context-aware mechanisms, mimicking human natural language. In this paper, we systematically investigate the performance of various LLMs when…

Computation and Language · Computer Science 2025-02-28 Javier Coronado-Blázquez

Space-Efficient Representation of Entity-centric Query Language Models

Virtual assistants make use of automatic speech recognition (ASR) to help users answer entity-centric queries. However, spoken entity recognition is a difficult problem, due to the large number of frequently-changing named entities. In…

Computation and Language · Computer Science 2022-07-01 Christophe Van Gysel , Mirko Hannemann , Ernest Pusateri , Youssef Oualil , Ilya Oparin

Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs

We present extensions to a continuous-state dependency parsing method that makes it applicable to morphologically rich languages. Starting with a high-performance transition-based parser that uses long short-term memory (LSTM) recurrent…

Computation and Language · Computer Science 2015-08-12 Miguel Ballesteros , Chris Dyer , Noah A. Smith

Transformers Can Represent $n$-gram Language Models

Existing work has analyzed the representational capacity of the transformer architecture by means of formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We…

Computation and Language · Computer Science 2024-06-21 Anej Svete , Ryan Cotterell

Learning Neural Models for Natural Language Processing in the Face of Distributional Shift

The dominating NLP paradigm of training a strong neural predictor to perform one task on a specific dataset has led to state-of-the-art performance in a variety of applications (eg. sentiment classification, span-prediction based question…

Computation and Language · Computer Science 2021-09-06 Paul Michel

An Overview on Language Models: Recent Developments and Outlook

Language modeling studies the probability distributions over strings of texts. It is one of the most fundamental tasks in natural language processing (NLP). It has been widely used in text generation, speech recognition, machine…

Computation and Language · Computer Science 2024-07-18 Chengwei Wei , Yun-Cheng Wang , Bin Wang , C. -C. Jay Kuo

Evaluating Distributional Distortion in Neural Language Modeling

A fundamental characteristic of natural language is the high rate at which speakers produce novel expressions. Because of this novelty, a heavy-tail of rare events accounts for a significant amount of the total probability mass of…

Computation and Language · Computer Science 2022-03-25 Benjamin LeBrun , Alessandro Sordoni , Timothy J. O'Donnell

Probability Distributions Computed by Autoregressive Transformers

Most expressivity results for transformers treat them as language recognizers -- devices that accept or reject strings -- rather than as they are used in practice: as language models that generate strings autoregressively and…

Computation and Language · Computer Science 2026-05-26 Andy Yang , Anej Svete , Jiaoda Li , Anthony Widjaja Lin , Jonathan Rawski , Ryan Cotterell , David Chiang

Polynomial-Time Proactive Synthesis of Tree-to-String Functions from Examples

Synthesis from examples enables non-expert users to generate programs by specifying examples of their behavior. A domain-specific form of such synthesis has been recently deployed in a widely used spreadsheet software product. In this paper…

Formal Languages and Automata Theory · Computer Science 2017-05-25 Mikaël Mayer , Jad Hamza , Viktor Kuncak

Time complexity for deterministic string machines

Algorithms which learn environments represented by automata in the past have had complexity scaling with the number of states in the automaton, which can be exponentially large even for automata recognizing regular expressions with a small…

Formal Languages and Automata Theory · Computer Science 2024-05-13 Ali Cataltepe , Vanessa Kosoy

Locally Typical Sampling

Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…

Computation and Language · Computer Science 2025-06-06 Clara Meister , Tiago Pimentel , Gian Wiher , Ryan Cotterell

Beyond Word-based Language Model in Statistical Machine Translation

Language model is one of the most important modules in statistical machine translation and currently the word-based language model dominants this community. However, many translation models (e.g. phrase-based models) generate the target…

Computation and Language · Computer Science 2015-02-06 Jiajun Zhang , Shujie Liu , Mu Li , Ming Zhou , Chengqing Zong

An Efficient Compiler for Weighted Rewrite Rules

Context-dependent rewrite rules are used in many areas of natural language and speech processing. Work in computational phonology has demonstrated that, given certain conditions, such rewrite rules can be represented as finite-state…

cmp-lg · Computer Science 2008-02-03 Mehryar Mohri , Richard Sproat