Related papers: On Missing Mass Variance

Mean-Squared Accuracy of Good-Turing Estimator

The brilliant method due to Good and Turing allows for estimating objects not occurring in a sample. The problem, known under names "sample coverage" or "missing mass" goes back to their cryptographic work during WWII, but over years has…

Machine Learning · Statistics 2021-04-16 Maciej Skorski

How Much is Unseen Depends Chiefly on Information About the Seen

The missing mass refers to the proportion of data points in an unknown population of classifier inputs that belong to classes not present in the classifier's training data, which is assumed to be a random sample from that unknown…

Machine Learning · Computer Science 2025-03-11 Seongmin Lee , Marcel Böhme

On the Concentration of the Missing Mass

A random variable is sampled from a discrete distribution. The missing mass is the probability of the set of points not observed in the sample. We sharpen and simplify McAllester and Ortiz's results (JMLR, 2003) bounding the probability of…

Probability · Mathematics 2012-10-12 Daniel Berend , Aryeh Kontorovich

Missing $g$-mass: Investigating the Missing Parts of Distributions

Estimating the underlying distribution from \textit{iid} samples is a classical and important problem in statistics. When the alphabet size is large compared to number of samples, a portion of the distribution is highly likely to be…

Statistics Theory · Mathematics 2023-05-30 Prafulla Chandra , Andrew Thangaraj

On consistent estimation of the missing mass

Given $n$ samples from a population of individuals belonging to different types with unknown proportions, how do we estimate the probability of discovering a new type at the $(n+1)$-th draw? This is a classical problem in statistics,…

Statistics Theory · Mathematics 2018-06-27 Fadhel Ayed , Marco Battiston , Federico Camerlenghi , Stefano Favaro

Concentration of the missing mass in metric spaces

We study the estimation and concentration on its expectation of the probability to observe data further than a specified distance from a given iid sample in a metric space. The problem extends the classical problem of estimation of the…

Statistics Theory · Mathematics 2022-11-23 Andreas Maurer

Minimax Risk for Missing Mass Estimation

The problem of estimating the missing mass or total probability of unseen elements in a sequence of $n$ random samples is considered under the squared error loss function. The worst-case risk of the popular Good-Turing estimator is shown to…

Information Theory · Computer Science 2017-05-16 Nikhilesh Rajaraman , Andrew Thangaraj , Ananda Theertha Suresh

Missing Mass Estimation from Sticky Channels

Distribution estimation under error-prone or non-ideal sampling modelled as "sticky" channels have been studied recently motivated by applications such as DNA computing. Missing mass, the sum of probabilities of missing letters, is an…

Statistics Theory · Mathematics 2022-02-08 Prafulla Chandra , Andrew Thangaraj , Nived Rajaraman

The Missing Mass Problem

We give tight lower and upper bounds on the expected missing mass for distributions over finite and countably infinite spaces. An essential characterization of the extremal distributions is given. We also provide an extension to totally…

Statistics Theory · Mathematics 2011-11-10 Daniel Berend , Aryeh Kontorovich

A Bennett Inequality for the Missing Mass

Novel concentration inequalities are obtained for the missing mass, i.e. the total probability mass of the outcomes not observed in the sample. We derive distribution-free deviation bounds with sublinear exponents in deviation size for…

Machine Learning · Statistics 2015-12-02 Bahman Yari Saeed Khanloo

Novel Deviation Bounds for Mixture of Independent Bernoulli Variables with Application to the Missing Mass

In this paper, we are concerned with obtaining distribution-free concentration inequalities for mixture of independent Bernoulli variables that incorporate a notion of variance. Missing mass is the total probability mass associated to the…

Machine Learning · Statistics 2015-03-05 Bahman Yari Saeed Khanloo

Estimating the Missing Mass, Partition Function or Evidence for a Case of Sampling from a Discrete Set

We consider the problem of estimating the missing mass, partition function or evidence and its probability distribution in the case that for each sample point in the discrete sample space its (unnormalized) probability mass is revealed.…

Statistics Theory · Mathematics 2026-03-16 Bastiaan J. Braams

What Is Meant by "Missing at Random"?

The concept of missing at random is central in the literature on statistical analysis with missing data. In general, inference using incomplete data should be based not only on observed data values but should also take account of the…

Methodology · Statistics 2013-06-13 Shaun Seaman , John Galati , Dan Jackson , John Carlin

Concentration inequalities in the infinite urn scheme for occupancy counts and the missing mass, with applications

An infinite urn scheme is defined by a probability mass function $(p_j)_{j\geq1}$ over positive integers. A random allocation consists of a sample of $N$ independent drawings according to this probability distribution where $N$ may be…

Statistics Theory · Mathematics 2016-09-29 Anna Ben-Hamou , Stéphane Boucheron , Mesrob I. Ohannessian

On the Impossibility of Learning the Missing Mass

This paper shows that one cannot learn the probability of rare events without imposing further structural assumptions. The event of interest is that of obtaining an outcome outside the coverage of an i.i.d. sample from a discrete…

Machine Learning · Statistics 2015-03-13 Elchanan Mossel , Mesrob I. Ohannessian

Missing at Random or Not: A Semiparametric Testing Approach

Practical problems with missing data are common, and statistical methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism…

Methodology · Statistics 2020-03-26 Rui Duan , C. Jason Liang , Pamela Shaw , Cheng Yong Tang , Yong Chen

A Good-Turing estimator for feature allocation models

Feature allocation models generalize species sampling models by allowing every observation to belong to more than one species, now called features. Under the popular Bernoulli product model for feature allocation, given $n$ samples, we…

Statistics Theory · Mathematics 2020-09-22 Fadhel Ayed , Marco Battiston , Federico Camerlenghi , Stefano Favaro

Revisiting Concentration of Missing Mass

We revisit the problem of \emph{missing mass concentration}, developing a new method of estimating concentration of heterogenic sums, in spirit of celebrated Rosenthal's inequality. As a result we slightly improve the state-of-art bounds…

Statistics Theory · Mathematics 2020-05-25 Maciej Skorski

Maximal Guesswork Leakage

We introduce the study of information leakage through \emph{guesswork}, the minimum expected number of guesses required to guess a random variable. In particular, we define \emph{maximal guesswork leakage} as the multiplicative decrease,…

Information Theory · Computer Science 2024-05-07 Gowtham R. Kurri , Malhar Managoli , Vinod M. Prabhakaran

MMD Two-sample Testing in the Presence of Arbitrarily Missing Data

In many real-world applications, it is common that a proportion of the data may be missing or only partially observed. We develop a novel two-sample testing method based on the Maximum Mean Discrepancy (MMD) which accounts for missing data…

Methodology · Statistics 2024-05-27 Yijin Zeng , Niall M. Adams , Dean A. Bodenham