English
Related papers

Related papers: Data Amplification: Instance-Optimal Property Esti…

200 papers

Estimating properties of discrete distributions is a fundamental problem in statistical learning. We design the first unified, linear-time, competitive, property estimator that for a wide class of properties and for all underlying…

Machine Learning · Statistics 2019-04-02 Yi Hao , Alon Orlitsky , Ananda T. Suresh , Yihong Wu

The statistical analysis of data stemming from dynamical systems, including, but not limited to, time series, routinely relies on the estimation of information theoretical quantities, most notably Shannon entropy. To this purpose, possibly…

Information Theory · Computer Science 2021-09-01 Leonardo Ricci , Alessio Perinelli , Michele Castelluzzo

The advent of data science has spurred interest in estimating properties of distributions over large alphabets. Fundamental symmetric properties such as support size, support coverage, entropy, and proximity to uniformity, received most…

Information Theory · Computer Science 2016-11-29 Jayadev Acharya , Hirakendu Das , Alon Orlitsky , Ananda Theertha Suresh

Estimating symmetric properties of a distribution, e.g. support size, coverage, entropy, distance to uniformity, are among the most fundamental problems in algorithmic statistics. While each of these properties have been studied extensively…

Data Structures and Algorithms · Computer Science 2019-05-22 Moses Charikar , Kirankumar Shiragur , Aaron Sidford

Shannon's entropy is one of the building blocks of information theory and an essential aspect of Machine Learning methods (e.g., Random Forests). Yet, it is only finitely defined for distributions with fast decaying tails on a countable…

Statistics Theory · Mathematics 2022-05-25 Jialin Zhang , Jingyi Shi

We study three fundamental statistical-learning problems: distribution estimation, property estimation, and property testing. We establish the profile maximum likelihood (PML) estimator as the first unified sample-optimal approach to a wide…

Machine Learning · Statistics 2019-07-12 Yi Hao , Alon Orlitsky

In this study an attempt has been made to propose a way to develop new distribution. For this purpose, we need only idea about distribution function. Some important statistical properties of the new distribution like moments, cumulants,…

Methodology · Statistics 2024-08-30 Brijesh P. Singh , Utpal Dhar Das

Shannon entropy is often a quantity of interest to linguists studying the communicative capacity of human language. However, entropy must typically be estimated from observed data because researchers do not have access to the underlying…

Computation and Language · Computer Science 2022-04-06 Aryaman Arora , Clara Meister , Ryan Cotterell

Recent years have witnessed the success of adaptive (or unified) approaches in estimating symmetric properties of discrete distributions, where one first obtains a distribution estimator independent of the target property, and then plugs…

Statistics Theory · Mathematics 2021-03-04 Yanjun Han

We consider the fundamental learning problem of estimating properties of distributions over large domains. Using a novel piecewise-polynomial approximation technique, we derive the first unified methodology for constructing sample- and…

Machine Learning · Computer Science 2020-03-18 Yi Hao , Alon Orlitsky

In this article, we construct semiparametrically efficient estimators of linear functionals of a probability measure in the presence of side information using an easy empirical likelihood approach. We use estimated constraint functions and…

Methodology · Statistics 2023-03-01 Shan Wang , Hanxiang Peng

The principle of maximum entropy is a broadly applicable technique for computing a distribution with the least amount of information possible constrained to match empirical data, for instance, feature expectations. We seek to generalize…

Information Theory · Computer Science 2022-05-30 Kenneth Bogert

We provide an efficient unified plug-in approach for estimating symmetric properties of distributions given $n$ independent samples. Our estimator is based on profile-maximum-likelihood (PML) and is sample optimal for estimating various…

Machine Learning · Statistics 2022-10-14 Moses Charikar , Zhihao Jiang , Kirankumar Shiragur , Aaron Sidford

Modern statistical estimation is often performed in a distributed setting where each sample belongs to a single user who shares their data with a central server. Users are typically concerned with preserving the privacy of their samples,…

Machine Learning · Computer Science 2023-05-16 Gecia Bravo-Hermsdorff , Róbert Busa-Fekete , Mohammad Ghavamzadeh , Andres Muñoz Medina , Umar Syed

In this paper we provide a new efficient algorithm for approximately computing the profile maximum likelihood (PML) distribution, a prominent quantity in symmetric property estimation. We provide an algorithm which matches the previous best…

Data Structures and Algorithms · Computer Science 2020-11-06 Nima Anari , Moses Charikar , Kirankumar Shiragur , Aaron Sidford

This paper proposes a new method of bandwidth selection in kernel estimation of density and distribution functions motivated by the connection between maximisation of the entropy of probability integral transforms and maximum likelihood in…

Methodology · Statistics 2016-07-14 Vitaliy Oryshchenko

Quantile estimation in deconvolution problems is studied comprehensively. In particular, the more realistic setup of unknown error distributions is covered. Our plug-in method is based on a deconvolution density estimator and is minimax…

Statistics Theory · Mathematics 2016-01-18 Itai Dattner , Markus Reiß , Mathias Trabs

We revisit the well-studied problem of estimating the Shannon entropy of a probability distribution, now given access to a probability-revealing conditional sampling oracle. In this model, the oracle takes as input the representation of a…

Cryptography and Security · Computer Science 2022-06-03 Priyanka Golia , Brendan Juba , Kuldeep S. Meel

In this paper, we introduce a new distribution generated by Lindley random variable which offers a more flexible model for modelling lifetime data. Various statistical properties like distribution function, survival function, moments,…

Applications · Statistics 2016-11-25 Deepesh Bhati , Mohd. Aamir Malik

Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the…

Databases · Computer Science 2015-03-19 Edith Cohen , Haim Kaplan
‹ Prev 1 2 3 10 Next ›