Related papers: Source codes in human communication
Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings…
Human language has a distinct systematic structure, where utterances break into individually meaningful words which are combined to form phrases. We show that natural-language-like systematicity arises in codes that are constrained by a…
Human languages vary widely in how they encode information within circumscribed semantic domains (e.g., time, space, color, human body parts and activities), but little is known about the global structure of semantic information and nothing…
The speech code is a vehicle of language: it defines a set of forms used by a community to carry information. Such a code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be…
When generating natural language from neural probabilistic models, high probability does not always coincide with high quality: It has often been observed that mode-seeking decoding methods, i.e., those that produce high-probability text…
Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code's abundance of patterns. In…
Human communication systems, such as language, evolve culturally; their components undergo reproduction and variation. However, a role for selection in cultural evolutionary dynamics is less clear. Often neutral evolution (also known as…
Code corpora, as observed in large software systems, are now known to be far more repetitive and predictable than natural language corpora. But why? Does the difference simply arise from the syntactic limitations of programming languages?…
The network characteristics based on the phonological similarities in the lexicons of several languages were examined. These languages differed widely in their history and linguistic structure, but commonalities in the network…
Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…
"Natural Language," whether spoken and attended to by humans, or processed and generated by computers, requires networked structures that reflect creative processes in semantic, syntactic, phonetic, linguistic, social, emotional, and…
Human beings are talkative. What advantage did their ancestors find in communicating so much? Numerous authors consider this advantage to be "obvious" and "enormous". If so, the problem of the evolutionary emergence of language amounts to…
Quantitative linguistics has been allowed, in the last few decades, within the admittedly blurry boundaries of the field of complex systems. A growing host of applied mathematicians and statistical physicists devote their efforts to…
Written language is a complex communication signal capable of conveying information encoded in the form of ordered sequences of words. Beyond the local order ruled by grammar, semantic and thematic structures affect long-range patterns in…
Large Language Models are useless for linguistics, as they are probabilistic models that require a vast amount of data to analyse externalized strings of words. In contrast, human language is underpinned by a mind-internal computational…
Language provides simple ways of communicating generalizable knowledge to each other (e.g., "Birds fly", "John hikes", "Fire makes smoke"). Though found in every language and emerging early in development, the language of generalization is…
Most languages use the relative order between words to encode meaning relations. Languages differ, however, in what orders they use and how these orders are mapped onto different meanings. We test the hypothesis that, despite these…
In contrast with animal communication systems, diversity is characteristic of almost every aspect of human language. Languages variously employ tones, clicks, or manual signs to signal differences in meaning; some languages lack the…
A sharp tension exists about the nature of human language between two opposite parties: those who believe that statistical surface distributions, in particular using measures like surprisal, provide a better understanding of language…
Linguistic laws constitute one of the quantitative cornerstones of modern cognitive sciences and have been routinely investigated in written corpora, or in the equivalent transcription of oral corpora. This means that inferences of…