Related papers: Assessing Keyness using Permutation Tests

Is my model perplexed for the right reason? Contrasting LLMs' Benchmark Behavior with Token-Level Perplexity

Standard evaluations of Large language models (LLMs) focus on task performance, offering limited insight into whether correct behavior reflects appropriate underlying mechanisms and risking confirmation bias. We introduce a simple,…

Computation and Language · Computer Science 2026-04-01 Zoë Prins , Samuele Punzo , Frank Wildenburg , Giovanni Cinà , Sandro Pezzelle

Accurate and Efficient Statistical Testing for Word Semantic Breadth

Measuring the breadth of a word's meaning, or its spread across contexts, has become feasible with contextualized token embeddings. A word type can be represented as a cloud of token vectors, with dispersion-based statistics serving as…

Computation and Language · Computer Science 2026-05-11 Yo Ehara

How to Compute the Probability of a Word

Language models (LMs) estimate a probability distribution over strings in a natural language; these distributions are crucial for computing perplexity and surprisal in linguistics research. While we are usually concerned with measuring…

Computation and Language · Computer Science 2024-10-15 Tiago Pimentel , Clara Meister

Efficiently estimating small p-values in permutation tests using importance sampling and cross-entropy method

Permutation tests are widely used for statistical hypothesis testing when the sampling distribution of the test statistic under the null hypothesis is analytically intractable or unreliable due to finite sample sizes. One critical challenge…

Computation · Statistics 2023-08-29 Yang Shi , Huining Kang , Ji-Hyun Lee , Hui Jiang

Corpus Considerations for Annotator Modeling and Scaling

Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios…

Computation and Language · Computer Science 2024-04-18 Olufunke O. Sarumi , Béla Neuendorf , Joan Plepi , Lucie Flek , Jörg Schlötterer , Charles Welch

Word Importance Explains How Prompts Affect Language Model Outputs

The emergence of large language models (LLMs) has revolutionized numerous applications across industries. However, their "black box" nature often hinders the understanding of how they make specific decisions, raising concerns about their…

Artificial Intelligence · Computer Science 2024-03-06 Stefan Hackmann , Haniyeh Mahmoudian , Mark Steadman , Michael Schmidt

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity

A high degree of topical diversity is often considered to be an important characteristic of interesting text documents. A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and…

Information Retrieval · Computer Science 2017-01-17 Hosein Azarbonyad , Mostafa Dehghani , Tom Kenter , Maarten Marx , Jaap Kamps , Maarten de Rijke

TextEssence: A Tool for Interactive Analysis of Semantic Shifts Between Corpora

Embeddings of words and concepts capture syntactic and semantic regularities of language; however, they have seen limited use as tools to study characteristics of different corpora and how they relate to one another. We introduce…

Computation and Language · Computer Science 2021-03-23 Denis Newman-Griffis , Venkatesh Sivaraman , Adam Perer , Eric Fosler-Lussier , Harry Hochheiser

Probabilistic Method of Measuring Linguistic Productivity

In this paper I propose a new way of measuring linguistic productivity that objectively assesses the ability of an affix to be used to coin new complex words and, unlike other popular measures, is not directly dependent upon token…

Computation and Language · Computer Science 2023-08-25 Sergei Monakhov

The variational principle for weights characterizing the relevance

The classical method of the thematic classification of texts is based on using the frequency weight on the list of words occurring in texts from the text corpus that determines the theme. In this method , the weight of each word is defined…

Optimization and Control · Mathematics 2017-01-31 Mikhail A. Antonets , Grigoriy P. Kogan

Beyond Exponential Decay: Rethinking Error Accumulation in Large Language Models

The prevailing assumption of an exponential decay in large language model (LLM) reliability with sequence length, predicated on independent per-token error probabilities, posits an inherent limitation for long autoregressive outputs. Our…

Computation and Language · Computer Science 2026-05-07 Mikhail L. Arbuzov , Sisong Bei , Ziwei Dong , Dmitri Kalaev , Alexey A. Shvets

You should evaluate your language model on marginal likelihood over tokenisations

Neural language models typically tokenise input text into sub-word units to achieve an open vocabulary. The standard approach is to use a single canonical tokenisation at both train and test time. We suggest that this approach is…

Computation and Language · Computer Science 2021-09-22 Kris Cao , Laura Rimell

Joint Word Representation Learning using a Corpus and a Semantic Lexicon

Methods for learning word representations using large text corpora have received much attention lately due to their impressive performance in numerous natural language processing (NLP) tasks such as, semantic similarity measurement, and…

Computation and Language · Computer Science 2015-11-23 Danushka Bollegala , Alsuhaibani Mohammed , Takanori Maehara , Ken-ichi Kawarabayashi

Termhood-based Comparability Metrics of Comparable Corpus in Special Domain

Cross-Language Information Retrieval (CLIR) and machine translation (MT) resources, such as dictionaries and parallel corpora, are scarce and hard to come by for special domains. Besides, these resources are just limited to a few languages,…

Computation and Language · Computer Science 2013-02-20 Sa Liu , Chengzhi Zhang

New methods for multiple testing in permutation inference for the general linear model

Permutation methods are commonly used to test significance of regressors of interest in general linear models (GLMs) for functional (image) data sets, in particular for neuroimaging applications as they rely on mild assumptions. Permutation…

Methodology · Statistics 2021-11-23 Tomas Mrkvicka , Mari Myllymaki , Mikko Kuronen , Naveen Naidu Narisetty

Using Score Distributions to Compare Statistical Significance Tests for Information Retrieval Evaluation

Statistical significance tests can provide evidence that the observed difference in performance between two methods is not due to chance. In Information Retrieval, some studies have examined the validity and suitability of such tests for…

Information Retrieval · Computer Science 2019-04-09 Javier Parapar , David E. Losada , Manuel A. Presedo-Quindimil , Alvaro Barreiro

A Tale of Two Temperatures: Simple, Efficient, and Diverse Sampling from Diffusion Language Models

Much work has been done on designing fast and accurate sampling for diffusion language models (dLLMs). However, these efforts have largely focused on the tradeoff between speed and quality of individual samples; how to additionally ensure…

Machine Learning · Computer Science 2026-04-14 Theo X. Olausson , Metod Jazbec , Xi Wang , Armando Solar-Lezama , Christian A. Naesseth , Stephan Mandt , Eric Nalisnick

perms: Likelihood-free estimation of marginal likelihoods for binary response data in Python and R

In Bayesian statistics, the marginal likelihood (ML) is the key ingredient needed for model comparison and model averaging. Unfortunately, estimating MLs accurately is notoriously difficult, especially for models where posterior simulation…

Computation · Statistics 2023-12-12 Dennis Christensen , Per August Jarval Moen

Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation

Computing next-token likelihood ratios between two language models (LMs) is a standard task in training paradigms such as knowledge distillation. Since this requires both models to share the same probability space, it becomes challenging…

Computation and Language · Computer Science 2026-05-07 Buu Phan , Ashish Khisti , Karen Ullrich

Improving Uncertainty Quantification in Large Language Models via Semantic Embeddings

Accurately quantifying uncertainty in large language models (LLMs) is crucial for their reliable deployment, especially in high-stakes applications. Current state-of-the-art methods for measuring semantic uncertainty in LLMs rely on strict…

Machine Learning · Computer Science 2024-10-31 Yashvir S. Grewal , Edwin V. Bonilla , Thang D. Bui