Related papers: A Comparison of natural (english) and artificial (…
A comparison of two english texts from Lewis Carroll, one (Alice in wonderland), also translated into esperanto, the other (Through a looking glass) are discussed in order to observe whether natural and artificial languages significantly…
A statistical physics study of punctuation effects on sentence lengths is presented for written texts: {\it Alice in wonderland} and {\it Through a looking glass}. The translation of the first text into esperanto is also considered as a…
A nonlinear dynamics approach can be used in order to quantify complexity in written texts. As a first step, a one-dimensional system is examined : two written texts by one author (Lewis Carroll) are considered, together with one…
We compared entropy for texts written in natural languages (English, Spanish) and artificial languages (computer software) based on a simple expression for the entropy as a function of message length and specific word diversity. Code text…
We study the entropy of Chinese and English texts, based on characters in case of Chinese texts and based on words for both languages. Significant differences are found between the languages and between different personal styles of debating…
This paper presents analysis of 30 literary texts written in English by different authors. For each text, there were created time series representing length of sentences in words and analyzed its fractal properties using two methods of…
This paper presents a general framework how controlled natural languages can be evaluated and compared on the basis of user experiments. The subjects are asked to classify given statements (in the language to be tested) as either true or…
Human and model-generated texts can be distinguished by examining the magnitude of likelihood in language. However, it is becoming increasingly difficult as language model's capabilities of generating human-like texts keep evolving. This…
Paraphrasing methods recognize, generate, or extract phrases, sentences, or longer natural language expressions that convey almost the same information. Textual entailment methods, on the other hand, recognize, generate, or extract pairs of…
Compositionality is a widely discussed property of natural languages, although its exact definition has been elusive. We focus on the proposal that compositionality can be assessed by measuring meaning-form correlation. We analyze…
A fundamental result in psycholinguistics is that less predictable words take a longer time to process. One theoretical explanation for this finding is Surprisal Theory (Hale, 2001; Levy, 2008), which quantifies a word's predictability as…
Statistical techniques that analyze texts, referred to as text analytics, have departed from the use of simple word count statistics towards a new paradigm. Text mining now hinges on a more sophisticated set of methods, including the…
We study temporal networks of characters in literature focusing on "Alice's Adventures in Wonderland" (1865) by Lewis Carroll and the anonymous "La Chanson de Roland" (around 1100). The former, one of the most influential pieces of nonsense…
Punctuation is the main factor introducing correlations in natural language written texts and it crucially impacts their overall effectiveness, expressiveness, and readability. Punctuation marks at the end of sentences are of particular…
We present in this paper a numerical investigation of literary texts by various well-known English writers, covering the first half of the twentieth century, based upon the results obtained through corpus analysis of the texts. A fractal…
We conduct a quantitative analysis contrasting human-written English news text with comparable large language model (LLM) output from six different LLMs that cover three different families and four sizes in total. Our analysis spans several…
The evolution of natural languages poses a riddle to any theoretical perspective based on efficiency considerations. If languages are already optimally effective means of organization and communication of thought, why do they change? And if…
The logical connectivity of text is represented by the connectivity of words that form archipelagoes. Here, each archipelago is a sequence of islands of the occurrences of a certain word. An island here means the local sequence of sentences…
When humans read a text, their eye movements are influenced by the structural complexity of the input sentences. This cognitive phenomenon holds across languages and recent studies indicate that multilingual language models utilize…
We investigated long range correlations in two literary texts, Moby Dick by H. Melville and Grimm's tales. The analysis is based on the calculation of entropy like quantities as the mutual information for pairs of letters and the entropy,…