Related papers: Random Language Model

Emergence of order in random languages

We consider languages generated by weighted context-free grammars. It is shown that the behaviour of large texts is controlled by saddle-point equations for an appropriate generating function. We then consider ensembles of grammars, in…

Disordered Systems and Neural Networks · Physics 2022-10-03 Eric De Giuli

Absence of Phase Transition in Random Language Model

The Random Language Model, proposed as a simple model of human languages, is defined by the averaged model of a probabilistic context-free grammar. This grammar expresses the process of sentence generation as a tree graph with nodes having…

Disordered Systems and Neural Networks · Physics 2022-07-07 Kai Nakaishi , Koji Hukushima

Large language models are not about natural language

Large Language Models are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language is underpinned by a mind-internal computational…

Computation and Language · Computer Science 2025-12-17 Johan J. Bolhuis , Andrea Moro , Stephen Crain , Sandiway Fong

Robustness of the Random Language Model

The Random Language Model (De Giuli 2019) is an ensemble of stochastic context-free grammars, quantifying the syntax of human and computer languages. The model suggests a simple picture of first language learning as a type of annealing in…

Disordered Systems and Neural Networks · Physics 2024-12-10 Fatemeh Lalegani , Eric De Giuli

Weighted random generation of context-free languages: Analysis of collisions in random urn occupancy models

The present work analyzes the redundancy of sets of combinatorial objects produced by a weighted random generation algorithm proposed by Denise et al. This scheme associates weights to the terminals symbols of a weighted context-free…

Data Structures and Algorithms · Computer Science 2010-12-07 Danièle Gardy , Yann Ponty

Random Words in a (Weighted) Regular Language: a Free Energy Approach

We study random words in a weighted regular language that achieve the maximal free energy using thermodynamics formalism. In particular, typical words in the language are algorithmically generated which have applications in computer…

Formal Languages and Automata Theory · Computer Science 2017-11-27 Cewei Cui , Zhe Dang

Searching for Structure: Investigating Emergent Communication with Large Language Models

Human languages have evolved to be structured through repeated language learning and use. These processes introduce biases that operate during language acquisition and shape linguistic systems toward communicative efficiency. In this paper,…

Computation and Language · Computer Science 2024-12-16 Tom Kouwenhoven , Max Peeperkorn , Tessa Verhoef

Randomness of formal languages via automatic martingales

We define a notion of randomness for individual and collections of formal languages based on automatic martingales acting on sequences of words from some underlying domain. An automatic martingale bets if the incoming word belongs to the…

Formal Languages and Automata Theory · Computer Science 2018-02-20 Birzhan Moldagaliyev

Locally Typical Sampling

Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…

Computation and Language · Computer Science 2025-06-06 Clara Meister , Tiago Pimentel , Gian Wiher , Ryan Cotterell

Linguistic Structure from a Bottleneck on Sequential Information Processing

Human language has a distinct systematic structure, where utterances break into individually meaningful words which are combined to form phrases. We show that natural-language-like systematicity arises in codes that are constrained by a…

Computation and Language · Computer Science 2025-11-19 Richard Futrell , Michael Hahn

On some representations of context-free languages

Context-free languages are widely used to describe the syntax of programming languages and natural languages. Usually, we describe a context-free language mathematically with the help of context-free grammar (for generation) or pushdown…

Formal Languages and Automata Theory · Computer Science 2020-10-13 Krasimir Yordzhev

Finding Structure in Language Models

When we speak, write or listen, we continuously make predictions based on our knowledge of a language's grammar. Remarkably, children acquire this grammatical knowledge within just a few years, enabling them to understand and generalise to…

Computation and Language · Computer Science 2024-11-26 Jaap Jumelet

Rule-weighted and terminal-weighted context-free grammars have identical expressivity

Two formalisms, both based on context-free grammars, have recently been proposed as a basis for a non-uniform random generation of combinatorial objects. The former, introduced by Denise et al, associates weights with letters, while the…

Computation and Language · Computer Science 2012-05-04 Yann Ponty

From Words to Waves: Analyzing Concept Formation in Speech and Text-Based Foundation Models

The emergence of large language models (LLMs) has demonstrated that systems trained solely on text can acquire extensive world knowledge, develop reasoning capabilities, and internalize abstract semantic concepts--showcasing properties that…

Computation and Language · Computer Science 2025-06-03 Asım Ersoy , Basel Mousi , Shammur Chowdhury , Firoj Alam , Fahim Dalvi , Nadir Durrani

Multilingual AMR-to-Text Generation

Generating text from structured data is challenging because it requires bridging the gap between (i) structure and natural language (NL) and (ii) semantically underspecified input and fully specified NL output. Multilingual generation…

Computation and Language · Computer Science 2020-11-12 Angela Fan , Claire Gardent

Language Models Are Implicitly Continuous

Language is typically modelled with discrete sequences. However, the most successful approaches to language modelling, namely neural networks, are continuous and smooth function approximators. In this work, we show that Transformer-based…

Computation and Language · Computer Science 2025-04-08 Samuele Marro , Davide Evangelista , X. Angelo Huang , Emanuele La Malfa , Michele Lombardi , Michael Wooldridge

Language Model Cascades

Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. These compositions are…

Computation and Language · Computer Science 2022-07-29 David Dohan , Winnie Xu , Aitor Lewkowycz , Jacob Austin , David Bieber , Raphael Gontijo Lopes , Yuhuai Wu , Henryk Michalewski , Rif A. Saurous , Jascha Sohl-dickstein , Kevin Murphy , Charles Sutton

The network of concepts in written texts

Complex network theory is used to investigate the structure of meaningful concepts in written texts of individual authors. Networks have been constructed after a two phase filtering, where words with less meaning contents are eliminated,…

Data Analysis, Statistics and Probability · Physics 2009-11-11 Silvia M. G. Caldeira , Thierry C. Petit Lobao , R. F. S. Andrade , Alexis Neme , J. G. V. Miranda

A Latent Space Theory for Emergent Abilities in Large Language Models

Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their…

Computation and Language · Computer Science 2023-09-15 Hui Jiang

Unsupervised Recurrent Neural Network Grammars

Recurrent neural network grammars (RNNG) are generative models of language which jointly model syntax and surface structure by incrementally generating a syntax tree and sentence in a top-down, left-to-right order. Supervised RNNGs achieve…

Computation and Language · Computer Science 2019-08-06 Yoon Kim , Alexander M. Rush , Lei Yu , Adhiguna Kuncoro , Chris Dyer , Gábor Melis