Related papers: Linguistic Structure from a Bottleneck on Sequenti…
As is the case of many signals produced by complex systems, language presents a statistical structure that is balanced between order and disorder. Here we review and extend recent results from quantitative characterisations of the degree of…
Today's probabilistic language generators fall short when it comes to producing coherent and fluent text despite the fact that the underlying models perform well under standard metrics, e.g., perplexity. This discrepancy has puzzled the…
Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings…
It has been shown in a recent publication that words in human-produced English language tend to have an information content close to the conditional entropy. In this paper, we show that the same is true for events in human-produced…
Statistical regularities in human language have fascinated researchers for decades, suggesting deep underlying principles governing its evolution and information structuring for efficient communication. While Zipf's Law describes the…
Human languages vary widely in how they encode information within circumscribed semantic domains (e.g., time, space, color, human body parts and activities), but little is known about the global structure of semantic information and nothing…
Natural languages are complexly structured entities. They exhibit characterising regularities that can be exploited to link them one another. In this work, I compare two morphological aspects of languages: Written Patterns and Sentence…
How universal is human conceptual structure? The way concepts are organized in the human brain may reflect distinct features of cultural, historical, and environmental background in addition to properties universal to human cognition.…
Written language is a complex communication signal capable of conveying information encoded in the form of ordered sequences of words. Beyond the local order ruled by grammar, semantic and thematic structures affect long-range patterns in…
A sharp tension exists about the nature of human language between two opposite parties: those who believe that statistical surface distributions, in particular using measures like surprisal, provide a better understanding of language…
Learning structural information from observational data is central to producing new knowledge outside the training corpus. This holds for mechanistic understanding in scientific discovery as well as flexible test-time compositional…
Languages vary widely in how meanings map to word forms. These mappings have been found to support efficient communication; however, this theory does not account for systematic relations within word forms. We examine how a restricted set of…
Humans communicate using systems of interconnected stimuli or concepts -- from language and music to literature and science -- yet it remains unclear how, if at all, the structure of these networks supports the communication of information.…
The average uncertainty associated with words is an information-theoretic concept at the heart of quantitative and computational linguistics. The entropy has been established as a measure of this average uncertainty - also called average…
When we speak, write or listen, we continuously make predictions based on our knowledge of a language's grammar. Remarkably, children acquire this grammatical knowledge within just a few years, enabling them to understand and generalise to…
Although information theoretic characterizations of human communication have become increasingly popular in linguistics, to date they have largely involved grafting probabilistic constructs onto older ideas about grammar. Similarities…
Written language is complex. A written text can be considered an attempt to convey a meaningful message which ends up being constrained by language rules, context dependence and highly redundant in its use of resources. Despite all these…
The speech code is a vehicle of language: it defines a set of forms used by a community to carry information. Such a code is necessary to support the linguistic interactions that allow humans to communicate. How then may a speech code be…
Human languages have evolved to be structured through repeated language learning and use. These processes introduce biases that operate during language acquisition and shape linguistic systems toward communicative efficiency. In this paper,…
Human language offers a powerful window into our thoughts -- we tell stories, give explanations, and express our beliefs and goals through words. Abundant evidence also suggests that language plays a developmental role in structuring our…