Related papers: Language-based Examples in the Statistics Classroo…

In many applications of natural language processing (NLP) it is necessary to determine the likelihood of a given word combination. For example, a speech recognizer may need to determine which of the two word combinations ``eat a peach'' and…

Computation and Language · Computer Science 2007-05-23 Ido Dagan , Lillian Lee , Fernando C. N. Pereira

Language Model Cascades

Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. These compositions are…

Computation and Language · Computer Science 2022-07-29 David Dohan , Winnie Xu , Aitor Lewkowycz , Jacob Austin , David Bieber , Raphael Gontijo Lopes , Yuhuai Wu , Henryk Michalewski , Rif A. Saurous , Jascha Sohl-dickstein , Kevin Murphy , Charles Sutton

Improving Statistical Language Model Performance with Automatically Generated Word Hierarchies

An automatic word classification system has been designed which processes word unigram and bigram frequency statistics extracted from a corpus of natural language utterances. The system implements a binary top-down form of word clustering…

cmp-lg · Computer Science 2016-08-31 John McMahon , F. J. Smith

A Probability--Quality Trade-off in Aligned Language Models and its Relation to Sampling Adaptors

The relationship between the quality of a string, as judged by a human reader, and its probability, $p(\boldsymbol{y})$ under a language model undergirds the development of better language models. For example, many popular algorithms for…

Computation and Language · Computer Science 2024-10-29 Naaman Tan , Josef Valvoda , Tianyu Liu , Anej Svete , Yanxia Qin , Kan Min-Yen , Ryan Cotterell

Statistical patterns of word frequency suggesting the probabilistic nature of human languages

Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings…

Computation and Language · Computer Science 2020-12-02 Shuiyuan Yu , Chunshan Xu , Haitao Liu

Invitaci\'on al estudio estad\'istico del lenguaje

Invitation to the statistical study of language: The topic of this presentation is the interdisciplinary nexus between linguistics and statistics. It targets linguists, for whom it may have a theoretical interest, or professionals that work…

Applications · Statistics 2018-04-23 Rogelio Nazar

Formation of Languages; Equality, Hierarchy and Teachers

A quantitative method is suggested, where meanings of words, and grammatic rules about these, of a vocabulary are represented by real numbers. People meet randomly, and average their vocabularies if they are equal; otherwise they either…

Physics and Society · Physics 2009-11-13 Caglar Tuncay

Rank dynamics of word usage at multiple scales

The recent dramatic increase in online data availability has allowed researchers to explore human culture with unprecedented detail, such as the growth and diversification of language. In particular, it provides statistical tools to explore…

Physics and Society · Physics 2026-02-04 José A. Morales , Ewan Colman , Sergio Sánchez , Fernanda Sánchez-Puig , Carlos Pineda , Gerardo Iñiguez , Germinal Cocho , Jorge Flores , Carlos Gershenson

Characterizing the dynamics of learning in repeated reference games

The language we use over the course of conversation changes as we establish common ground and learn what our partner finds meaningful. Here we draw upon recent advances in natural language processing to provide a finer-grained…

Computation and Language · Computer Science 2020-04-15 Robert D. Hawkins , Michael C. Frank , Noah D. Goodman

What Are the Odds? Language Models Are Capable of Probabilistic Reasoning

Language models (LM) are capable of remarkably complex linguistic tasks; however, numerical reasoning is an area in which they frequently struggle. An important but rarely evaluated form of reasoning is understanding probability…

Computation and Language · Computer Science 2024-10-01 Akshay Paruchuri , Jake Garrison , Shun Liao , John Hernandez , Jacob Sunshine , Tim Althoff , Xin Liu , Daniel McDuff

A Data Mining view on Class Room Teaching Language

From ancient period in India, educational institution embarked to use class room teaching. Where a teacher explains the material and students understand and learn the lesson. There is no absolute scale for measuring knowledge but…

Databases · Computer Science 2011-04-22 Umesh Kumar Pandey , Saurabh Pal

Calculating Probabilities Simplifies Word Learning

Children can use the statistical regularities of their environment to learn word meanings, a mechanism known as cross-situational learning. We take a computational approach to investigate how the information present during each observation…

Computation and Language · Computer Science 2017-02-23 Aida Nematzadeh , Barend Beekhuizen , Shanshan Huang , Suzanne Stevenson

Conditions on Consistency of Probabilistic Tree Adjoining Grammars

Much of the power of probabilistic methods in modelling language comes from their ability to compare several derivations for the same string in the language. An important starting point for the study of such cross-derivational properties is…

Computation and Language · Computer Science 2007-05-23 Anoop Sarkar

How to Compute the Probability of a Word

Language models (LMs) estimate a probability distribution over strings in a natural language; these distributions are crucial for computing perplexity and surprisal in linguistics research. While we are usually concerned with measuring…

Computation and Language · Computer Science 2024-10-15 Tiago Pimentel , Clara Meister

Comparing Models of Associative Meaning: An Empirical Investigation of Reference in Simple Language Games

Simple reference games are of central theoretical and empirical importance in the study of situated language use. Although language provides rich, compositional truth-conditional semantics to facilitate reference, speakers and listeners may…

Computation and Language · Computer Science 2018-10-10 Judy Hanwen Shen , Matthias Hofer , Bjarke Felbo , Roger Levy

Complexity and universality in the long-range order of words

As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of…

Computation and Language · Computer Science 2015-03-05 Marcelo A Montemurro , Damián H Zanette

What Can String Probability Tell Us About Grammaticality?

What have language models (LMs) learned about grammar? This question remains hotly debated, with major ramifications for linguistic theory. However, since probability and grammaticality are distinct notions in linguistics, it is not obvious…

Computation and Language · Computer Science 2025-11-10 Jennifer Hu , Ethan Gotlieb Wilcox , Siyuan Song , Kyle Mahowald , Roger P. Levy

Prefix Probabilities from Stochastic Tree Adjoining Grammars

Language models for speech recognition typically use a probability model of the form Pr(a_n | a_1, a_2, ..., a_{n-1}). Stochastic grammars, on the other hand, are typically used to assign structure to utterances. A language model of the…

Computation and Language · Computer Science 2007-05-23 Mark-Jan Nederhof , Anoop Sarkar , Giorgio Satta

Linguistic Dependencies and Statistical Dependence

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we…

Computation and Language · Computer Science 2022-05-02 Jacob Louis Hoover , Alessandro Sordoni , Wenyu Du , Timothy J. O'Donnell

Spectral Analysis of Word Statistics

Given a random text over a finite alphabet, we study the frequencies at which fixed-length words occur as subsequences. As the data size grows, the joint distribution of word counts exhibits a rich asymptotic structure. We investigate all…

Probability · Mathematics 2026-05-06 Chaim Even-Zohar , Tsviqa Lakrec , Ran J. Tessler