Related papers: Estimation with Binned Data

Robust estimation of inequality from binned incomes

Researchers must often estimate income inequality using data that give only the number of cases (e.g., families or households) whose incomes fall in "bins" such as $0-9,999, $10,000-14,999,..., $200,000+. We find that popular methods for…

Methodology · Statistics 2017-12-18 Paul T. von Hippel , Samuel V. Scarpino , Igor Holas

Potential fitting biases resulting from grouping data into variable width bins

When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses…

Data Analysis, Statistics and Probability · Physics 2012-09-13 S. Towers

Tunable robustness in power-law inference

Power-law probability distributions arise often in the social and natural sciences. Statistics have been developed for estimating the exponent parameter as well as gauging goodness-of-fit to a power law. Yet paradoxically, many famous power…

Methodology · Statistics 2024-02-21 Qianying Lin , Mitchell Newberry

Better estimates from binned income data: Interpolated CDFs and mean-matching

Researchers often estimate income statistics from summaries that report the number of incomes in bins such as \$0-10,000, \$10,001-20,000,...,\$200,000+. Some analysts assign incomes to bin midpoints, but this treats income as discrete.…

Methodology · Statistics 2017-10-18 Paul T. von Hippel , David J. Hunter , McKalie Drown

Binned semiparametric Bayesian networks for efficient kernel density estimation

This paper introduces a new type of probabilistic semiparametric model that takes advantage of data binning to reduce the computational cost of kernel density estimation in nonparametric distributions. Two new conditional probability…

Machine Learning · Computer Science 2026-04-02 Rafael Sojo , Javier Díaz-Rozo , Concha Bielza , Pedro Larrañaga

Evaluation of Bagging Predictors with Kernel Density Estimation and Bagging Score

For a larger set of predictions of several differently trained machine learning models, known as bagging predictors, the mean of all predictions is taken by default. Nevertheless, this proceeding can deviate from the actual ground truth in…

Machine Learning · Computer Science 2026-04-07 Philipp Seitz , Jan Schmitt , Andreas Schiffler

A Family of Generalized Beta Distributions for Income

The mathematical properties of a family of generalized beta distribution, including beta-normal, skewed-t, log-F, beta-exponential, beta-Weibull distributions have recently been studied in several publications. This paper applies these…

Methodology · Statistics 2007-10-26 J. H. Sepanski , Lingji Kong

Bayesian MIDAS Penalized Regressions: Estimation, Selection, and Prediction

We propose a new approach to mixed-frequency regressions in a high-dimensional environment that resorts to Group Lasso penalization and Bayesian techniques for estimation and inference. In particular, to improve the prediction properties of…

Econometrics · Economics 2020-06-12 Matteo Mogliani , Anna Simoni

Information-theoretic Generalization Analysis for Expected Calibration Error

While the expected calibration error (ECE), which employs binning, is widely adopted to evaluate the calibration performance of machine learning models, theoretical understanding of its estimation bias is limited. In this paper, we present…

Machine Learning · Computer Science 2025-05-27 Futoshi Futami , Masahiro Fujisawa

Estimation Method under Three-Parameter Generalized Exponential Model: Consistency, Uniqueness and its Applications

In numerous instances, the generalized exponential distribution can be used as an alternative to the most widely used non-regular family of distributions: Weibull, gamma, lognormal with three-parameters when analyzing lifetime or any skewed…

Methodology · Statistics 2026-03-03 Kiran Prajapat , Sharmishtha Mitra , Debasis Kundu

A General Theory of Goodness of Fit in Likelihood Fits

Maximum likelihood fits to data can be performed using binned data and unbinned data. The likelihood fits in either case produce only the fitted quantities but not the goodness of fit. With binned data, one can obtain a measure of the…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Rajendran Raja

Compound Estimation for Binomials

Many applications involve estimating the mean of multiple binomial outcomes as a common problem -- assessing intergenerational mobility of census tracts, estimating prevalence of infectious diseases across countries, and measuring…

Econometrics · Economics 2026-01-01 Yan Chen , Lihua Lei

Mitigating Bias in Calibration Error Estimation

For an AI system to be reliable, the confidence it expresses in its decisions must match its accuracy. To assess the degree of match, examples are typically binned by confidence and the per-bin mean confidence and accuracy are compared.…

Machine Learning · Computer Science 2022-02-14 Rebecca Roelofs , Nicholas Cain , Jonathon Shlens , Michael C. Mozer

Performance Evaluation of Classification Models for Household Income, Consumption and Expenditure Data Set

Food security is more prominent on the policy agenda today than it has been in the past, thanks to recent food shortages at both the regional and global levels as well as renewed promises from major donor countries to combat chronic hunger.…

Machine Learning · Computer Science 2021-06-22 Mersha Nigus , Dorsewamy

Estimating a sharp convergence bound for randomized ensembles

When randomized ensembles such as bagging or random forests are used for binary classification, the prediction error of the ensemble tends to decrease and stabilize as the number of classifiers increases. However, the precise relationship…

Probability · Mathematics 2019-05-01 Miles E. Lopes

A study of the personal income distribution in Australia

We analyze the data on personal income distribution from the Australian Bureau of Statistics. We compare fits of the data to the exponential, log-normal, and gamma distributions. The exponential function gives a good (albeit not perfect)…

Physics and Society · Physics 2008-12-02 Anand Banerjee , Victor M. Yakovenko , T. Di Matteo

Power-law distributions in binned empirical data

Many man-made and natural phenomena, including the intensity of earthquakes, population of cities and size of international wars, are believed to follow power-law distributions. The accurate identification of power-law patterns has…

Data Analysis, Statistics and Probability · Physics 2014-04-15 Yogesh Virkar , Aaron Clauset

Bayesian Variable Selection in Distributed Lag Models: A Focus on Binary Quantile and Count Data Regressions

Distributed Lag Models (DLMs) and similar regression approaches such as MIDAS have been used for many decades in econometrics and more recently to investigate how poor air quality adversely affects human health. In this paper we describe…

Methodology · Statistics 2025-01-30 Daniel Dempsey , Jason Wyse

A Class of Skewed Distributions with Applications in Environmental Data

In environmental studies, many data are typically skewed and it is desired to have a flexible statistical model for this kind of data. In this paper, we study a class of skewed distributions by invoking arguments as described by Ferreira…

Applications · Statistics 2018-04-06 Indranil Ghosh , Hon Keung Tony Ng

On the Use of Bagging for Local Intrinsic Dimensionality Estimation

The theory of Local Intrinsic Dimensionality (LID) has become a valuable tool for characterizing local complexity within and across data manifolds, supporting a range of data mining and machine learning tasks. Accurate LID estimation…

Machine Learning · Computer Science 2026-03-26 Kristóf Péter , Ricardo J. G. B. Campello , James Bailey , Michael E. Houle