Related papers: Remarks on "Random Sequences"

Randomness of formal languages via automatic martingales

We define a notion of randomness for individual and collections of formal languages based on automatic martingales acting on sequences of words from some underlying domain. An automatic martingale bets if the incoming word belongs to the…

Formal Languages and Automata Theory · Computer Science 2018-02-20 Birzhan Moldagaliyev

Multilingual Sentence Categorization according to Language

In this paper, we describe an approach to sentence categorization which has the originality to be based on natural properties of languages with no training set dependency. The implementation is fast, small, robust and textual errors…

cmp-lg · Computer Science 2016-08-31 Emmanuel Giguet

Statistical patterns of word frequency suggesting the probabilistic nature of human languages

Traditional linguistic theories have largely regard language as a formal system composed of rigid rules. However, their failures in processing real language, the recent successes in statistical natural language processing, and the findings…

Computation and Language · Computer Science 2020-12-02 Shuiyuan Yu , Chunshan Xu , Haitao Liu

The Language of Betting as a Strategy for Statistical and Scientific Communication

The established language for statistical testing --- significance levels, power, and p-values --- is overly complicated and deceptively conclusive. Even teachers of statistics and scientists who use statistics misinterpret the results of…

Statistics Theory · Mathematics 2019-10-23 Glenn Shafer

Experiments on predictability of word in context and information rate in natural language

Based on data from a large-scale experiment with human subjects, we conclude that the logarithm of probability to guess a word in context (unpredictability) depends linearly on the word length. This result holds both for poetry and prose,…

Information Theory · Computer Science 2007-07-16 Dmitrii Manin

An Empirical Comparison of Parsing Methods for Stanford Dependencies

Stanford typed dependencies are a widely desired representation of natural language sentences, but parsing is one of the major computational bottlenecks in text analysis systems. In light of the evolving definition of the Stanford…

Computation and Language · Computer Science 2014-04-17 Lingpeng Kong , Noah A. Smith

Stability of meanings versus rate of replacement of words: an experimental test

The words of a language are randomly replaced in time by new ones, but it has long been known that words corresponding to some items (meanings) are less frequently replaced than others. Usually, the rate of replacement for a given item is…

Computation and Language · Computer Science 2018-10-24 Michele Pasquini , Maurizio Serva

A Likelihood Ratio Test of Genetic Relationship among Languages

Lexical resemblances among a group of languages indicate that the languages could be genetically related, i.e., they could have descended from a common ancestral language. However, such resemblances can arise by chance and, hence, need not…

Computation and Language · Computer Science 2024-04-02 V. S. D. S. Mahesh Akavarapu , Arnab Bhattacharya

Probability asymptotics: notes on notation

Some asymptotic notions for random variables are discussed. In particular, different versions of O and o for sequences of random variables are studied. The results are elementary and more or less well-known, but collected here for future…

Probability · Mathematics 2011-08-22 Svante Janson

Randomness of D Sequences via Diehard Testing

This paper presents a comparison of the quality of randomness of D sequences based on diehard tests. Since D sequences can model any random sequence, this comparison is of value beyond this specific class.

Numerical Analysis · Computer Science 2013-12-13 James Bellamy

An Independence Test Based on Recurrence Rates. An empirical study and applications to real data

In this paper we propose several variants to perform the independence test between two random elements based on recurrence rates. We will show how to calculate the test statistic in each one of these cases. From simulations we obtain that…

Methodology · Statistics 2020-09-21 Juan Kalemkerian , Diego Fernández

Linguistic Dependencies and Statistical Dependence

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency? This empirical question is motivated by a long history of literature in cognitive science, psycholinguistics, and NLP. In this work we…

Computation and Language · Computer Science 2022-05-02 Jacob Louis Hoover , Alessandro Sordoni , Wenyu Du , Timothy J. O'Donnell

Word frequency-rank relationship in tagged texts

We analyze the frequency-rank relationship in sub-vocabularies corresponding to three different grammatical classes (nouns, verbs, and others) in a collection of literary works in English, whose words have been automatically tagged…

Computation and Language · Computer Science 2021-06-11 A. Chacoma , D. H. Zanette

Statistical Properties of European Languages and Voynich Manuscript Analysis

The statistical properties of letters frequencies in European literature texts are investigated. The determination of logarithmic dependence of letters sequence for one-language and two-language texts are examined. The pare of languages is…

Applications · Statistics 2016-11-29 Andronik Arutyunov , Leonid Borisov , Sergey Fedorov , Anastasiya Ivchenko , Elizabeth Kirina-Lilinskaya , Yurii Orlov , Konstantin Osminin , Sergey Shilin , Dmitriy Zeniuk

Sensivity of LLMs' Explanations to the Training Randomness:Context, Class & Task Dependencies

Transformer models are now a cornerstone in natural language processing. Yet, explaining their decisions remains a challenge. It was shown recently that the same model trained on the same data with a different randomness can lead to very…

Computation and Language · Computer Science 2026-03-10 Romain Loncour , Jérémie Bogaert , François-Xavier Standaert

Identifying Quantum Mechanical Statistics in Italian Corpora

We present a theoretical and empirical investigation of the statistical behaviour of the words in a text produced by human language. To this aim, we analyse the word distribution of various texts of Italian language selected from a specific…

Neurons and Cognition · Quantitative Biology 2025-04-15 Diederik Aerts , Jonito Aerts Arguëlles , Lester Beltran , Massimiliano Sassoli de Bianchi , Sandro Sozzo

A Structured Language Model

The paper presents a language model that develops syntactic structure and uses it to extract meaningful information from the word history, thus enabling the use of long distance dependencies. The model assigns probability to every joint…

Computation and Language · Computer Science 2007-05-23 Ciprian Chelba

A conditional randomization test to account for covariate imbalance in randomized experiments

We consider the conditional randomization test as a way to account for covariate imbalance in randomized experiments. The test accounts for covariate imbalance by comparing the observed test statistic to the null distribution of the test…

Methodology · Statistics 2017-04-24 Jonathan Hennessy , Tirthankar Dasgupta , Luke Miratrix , Cassandra Pattanayak , Pradipta Sarkar

Reliable Detection and Quantification of Selective Forces in Language Change

Language change is a cultural evolutionary process in which variants of linguistic variables change in frequency through processes analogous to mutation, selection and genetic drift. In this work, we apply a recently-introduced method to…

Computation and Language · Computer Science 2023-08-22 Juan Guerrero Montero , Andres Karjus , Kenny Smith , Richard A. Blythe

Wald type and Phi-divergence based test-statistics for isotonic binomial proportions

In this paper new test statistics are introduced and studied for the important problem of testing hypothesis that involves inequality constraint on proportions when the sample comes from independent binomial random variables: Wald type and…

Methodology · Statistics 2014-02-28 Nirian Martín , Raquel Mata , Leando Pardo