Related papers: Optimal Data-Based Binning for Histograms
The histogram is an analysis tool in widespread use within many sciences, with high energy physics as a prime example. However, there exists an inherent bias in the choice of binning for the histogram, with different choices potentially…
Rank histograms are popular tools for assessing the reliability of meteorological ensemble forecast systems. A reliable forecast system leads to a uniform rank histogram, and deviations from uniformity can indicate miscalibrations. However,…
The histogram method is a powerful non-parametric approach for estimating the probability density function of a continuous variable. But the construction of a histogram, compared to the parametric approaches, demands a large number of…
Data points are placed in bins when a histogram is created, but there is always a decision to be made about the number or width of the bins. This decision is often made arbitrarily or subjectively, but it need not be. A jackknife or…
The histogram is widely used as a simple, exploratory display of data, but it is usually not clear how to choose the number and size of bins. We construct a confidence set of distribution functions that optimally address the two main tasks…
This paper presents a quantitative user study to evaluate how well users can visually perceive the underlying data distribution from a histogram representation. We used different sample and bin sizes and four different distributions…
We present sparse tree-based and list-based density estimation methods for binary/categorical data. Our density estimation models are higher dimensional analogies to variable bin width histograms. In each leaf of the tree (or list), the…
When one deals with data drawn from continuous variables, a histogram is often inadequate to display their probability density. It deals inefficiently with statistical noise, and binsizes are free parameters. In contrast to that, the…
The simplicity and expressiveness of a histogram render it a useful feature in different contexts including deep learning. Although the process of computing a histogram is non-differentiable, researchers have proposed differentiable…
Context. Visualization of 2D distributions is an essential task, commonly done with a 2D histogram. The histogram is built by subdividing the sample space into regions and color-coding the number of samples in each region. Aims. We aim to…
The histogram is a key method for visualizing data and estimating the underlying probability distribution. Incorrect conclusions about the data result from over or under-binning. A new method based on the Shannon entropy of the histogram…
Accurate calibration of probabilistic predictive models learned is critical for many practical prediction and decision-making tasks. There are two main categories of methods for building calibrated classifiers. One approach is to develop…
We propose a new method of histogram construction, providing a fully Bayesian approach to irregular histograms. Our procedure applies Bayesian model selection to a piecewise constant model of the underlying distribution, resulting in a…
Density Estimation is one of the central areas of statistics whose purpose is to estimate the probability density function underlying the observed data. It serves as a building block for many tasks in statistical inference, visualization,…
Weighted histogram in Monte-Carlo simulations is often used for the estimation of a probability density function. It is obtained as a result of random experiment with random events that have weights. In this paper the bin contents of…
When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses…
In this work, we investigate the statistical computation of the Boltzmann entropy of statistical samples. For this purpose, we use both histogram and kernel function to estimate the probability density function of statistical samples. We…
Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes, "reliability diagrams" help detect and diagnose…
We investigate the problem of testing whether a discrete probability distribution over an ordered domain is a histogram on a specified number of bins. One of the most common tools for the succinct approximation of data, $k$-histograms over…
Reliable density estimation is fundamental for numerous applications in statistics and machine learning. In many practical scenarios, data are best modeled as mixtures of component densities that capture complex and multimodal patterns.…