Related papers: Statistical Patterns in Written Language
Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings…
Methods and insights from statistical physics are finding an increasing variety of applications where one seeks to understand the emergent properties of a complex interacting system. One such area concerns the dynamics of language at a…
Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of…
Linguistic laws, the common statistical patterns of human language, have been investigated by quantitative linguists for nearly a century. Recently, biologists from a range of disciplines have started to explore the prevalence of these laws…
Language is a social phenomenon and variation is inherent to its social nature. Recently, there has been a surge of interest within the computational linguistics (CL) community in the social dimension of language. In this article we present…
Written language is a complex communication signal capable of conveying information encoded in the form of ordered sequences of words. Beyond the local order ruled by grammar, semantic and thematic structures affect long-range patterns in…
Statistical physics has proven to be a very fruitful framework to describe phenomena outside the realm of traditional physics. The last years have witnessed the attempt by physicists to study collective phenomena emerging from the…
A sharp tension exists about the nature of human language between two opposite parties: those who believe that statistical surface distributions, in particular using measures like surprisal, provide a better understanding of language…
As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of…
A simple review by a linguist, citing many articles by physicists: Quantitative methods, agent-based computer simulations, language dynamics, language typology, historical linguistics
Human language has a distinct systematic structure, where utterances break into individually meaningful words which are combined to form phrases. We show that natural-language-like systematicity arises in codes that are constrained by a…
Languages vary widely in how meanings map to word forms. These mappings have been found to support efficient communication; however, this theory does not account for systematic relations within word forms. We examine how a restricted set of…
This chapter critically examines the potential contributions of modern language models to theoretical linguistics. Despite their focus on engineering goals, these models' ability to acquire sophisticated linguistic knowledge from mere…
Is it possible to develop a `physics of language' which can explain the spatial, temporal and social patterns we see, and which can predict future change like we forecast the weather? Such a theory is likely to involve ideas from…
This paper compares a qualitative reasoning model of translation with a quantitative statistical model. We consider these models within the context of two hypothetical speech translation systems, starting with a logic-based design and…
Prompted models have demonstrated impressive few-shot learning abilities. Repeated interactions at test-time with a single model, or the composition of multiple models together, further expands capabilities. These compositions are…
Human languages employ constructions that tacitly assume specific properties of the limited range of phenomena they evolved to describe. These assumed properties are true features of that limited context, but may not be general or precise…
Quantitative aspects of computation are related to the use of both physical and mathematical quantities, including time, performance metrics, probability, and measures for reliability and security. They are essential in characterizing the…
Recent breakthroughs in large language models (LLM) have stirred up global attention, and the research has been accelerating non-stop since then. Philosophers and psychologists have also been researching the structure of language for…
The review summarizes the main methodological concepts used in studying natural language from the perspective of complexity science and documents their applicability in identifying both universal and system-specific features of language in…