Related papers: Data analysis recipes: Choosing the binning for a …

Optimal Data-Based Binning for Histograms

Histograms are convenient non-parametric density estimators, which continue to be used ubiquitously. Summary quantities estimated from histogram-based probability density models depend on the choice of the number of bins. We introduce a…

Data Analysis, Statistics and Probability · Physics 2013-09-17 Kevin H. Knuth

On the number of bins in a rank histogram

Rank histograms are popular tools for assessing the reliability of meteorological ensemble forecast systems. A reliable forecast system leads to a uniform rank histogram, and deviations from uniformity can indicate miscalibrations. However,…

Applications · Statistics 2022-09-30 Claudio Heinrich

Resolving Histogram Binning Dilemmas with Binless and Binfull Algorithms

The histogram is an analysis tool in widespread use within many sciences, with high energy physics as a prime example. However, there exists an inherent bias in the choice of binning for the histogram, with different choices potentially…

Data Analysis, Statistics and Probability · Physics 2014-05-21 Abram Krislock , Nathan Krislock

From Data to Probability Densities without Histograms

When one deals with data drawn from continuous variables, a histogram is often inadequate to display their probability density. It deals inefficiently with statistical noise, and binsizes are free parameters. In contrast to that, the…

Data Analysis, Statistics and Probability · Physics 2009-11-13 Bernd A. Berg , Robert C. Harris

Potential fitting biases resulting from grouping data into variable width bins

When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses…

Data Analysis, Statistics and Probability · Physics 2012-09-13 S. Towers

The Essential Histogram

The histogram is widely used as a simple, exploratory display of data, but it is usually not clear how to choose the number and size of bins. We construct a confidence set of distribution functions that optimally address the two main tasks…

Statistics Theory · Mathematics 2020-02-13 Housen Li , Axel Munk , Hannes Sieling , Guenther Walther

Composite likelihood methods for histogram-valued random variables

Symbolic data analysis has been proposed as a technique for summarising large and complex datasets into a much smaller and tractable number of distributions -- such as random rectangles or histograms -- each describing a portion of the…

Computation · Statistics 2020-03-23 Thomas Whitaker , Boris Beranger , Scott A. Sisson

Histogram binning revisited with a focus on human perception

This paper presents a quantitative user study to evaluate how well users can visually perceive the underlying data distribution from a histogram representation. We used different sample and bin sizes and four different distributions…

Human-Computer Interaction · Computer Science 2021-09-15 Raphael Sahann , Torsten Möller , Johanna Schmidt

OSCAR: A Semantic-based Data Binning Approach

Binning is applied to categorize data values or to see distributions of data. Existing binning algorithms often rely on statistical properties of data. However, there are semantic considerations for selecting appropriate binning schemes.…

Human-Computer Interaction · Computer Science 2022-07-19 Vidya Setlur , Michael Correll , Sarah Battersby

Subsampling and Jackknifing: A Practically Convenient Solution for Large Data Analysis with Limited Computational Resources

Modern statistical analysis often encounters datasets with large sizes. For these datasets, conventional estimation methods can hardly be used immediately because practitioners often suffer from limited computational resources. In most…

Methodology · Statistics 2023-04-14 Shuyuan Wu , Xuening Zhu , Hansheng Wang

Mining The Successful Binary Combinations: Methodology and A Simple Case Study

The importance of finding the characteristics leading to either a success or a failure is one of the driving forces of data mining. The various application areas of finding success/failure factors cover vast variety of areas such as credit…

Databases · Computer Science 2010-02-08 Yuval Cohen

Plotting the Differences Between Data and Expectation

This article proposes a way to improve the presentation of histograms where data are compared to expectation. Sometimes, it is difficult to judge by eye whether the difference between the bin content and the theoretical expectation…

Data Analysis, Statistics and Probability · Physics 2018-02-21 Georgios Choudalakis , Diego Casadei

Data analysis recipes: Fitting a model to data

We go through the many considerations involved in fitting a model to data, using as an example the fit of a straight line to a set of points in a two-dimensional plane. Standard weighted least-squares fitting is only appropriate when there…

Instrumentation and Methods for Astrophysics · Physics 2010-08-30 David W. Hogg , Jo Bovy , Dustin Lang

Differentiable Histogram with Hard-Binning

The simplicity and expressiveness of a histogram render it a useful feature in different contexts including deep learning. Although the process of computing a histogram is non-differentiable, researchers have proposed differentiable…

Machine Learning · Computer Science 2020-12-14 Ibrahim Yusuf , George Igwegbe , Oluwafemi Azeez

The Shannon Entropy of a Histogram

The histogram is a key method for visualizing data and estimating the underlying probability distribution. Incorrect conclusions about the data result from over or under-binning. A new method based on the Shannon entropy of the histogram…

Data Analysis, Statistics and Probability · Physics 2022-10-07 Stephen Watts , Lisa Crow

The Analysis of Data from Continuous Probability Distributions

Conventional statistics begins with a model, and assigns a likelihood of obtaining any particular set of data. The opposite approach, beginning with the data and assigning a likelihood to any particular model, is explored here for the case…

Data Analysis, Statistics and Probability · Physics 2009-10-30 Timothy E. Holy

Binary Classifier Calibration: Non-parametric approach

Accurate calibration of probabilistic predictive models learned is critical for many practical prediction and decision-making tasks. There are two main categories of methods for building calibrated classifiers. One approach is to develop…

Machine Learning · Statistics 2014-01-16 Mahdi Pakdaman Naeini , Gregory F. Cooper , Milos Hauskrecht

Algorithmic statistics, prediction and machine learning

Algorithmic statistics considers the following problem: given a binary string $x$ (e.g., some experimental data), find a "good" explanation of this data. It uses algorithmic information theory to define formally what is a good explanation.…

Machine Learning · Computer Science 2015-09-21 Alexey Milovanov

Metrics of calibration for probabilistic predictions

Predictions are often probabilities; e.g., a prediction could be for precipitation tomorrow, but with only a 30% chance. Given such probabilistic predictions together with the actual outcomes, "reliability diagrams" help detect and diagnose…

Statistics Theory · Mathematics 2022-11-15 Imanol Arrieta-Ibarra , Paman Gujral , Jonathan Tannen , Mark Tygert , Cherie Xu

Enabling fundamental understanding of Nature with novel binning methods for 2D histograms

Context. Visualization of 2D distributions is an essential task, commonly done with a 2D histogram. The histogram is built by subdividing the sample space into regions and color-coding the number of samples in each region. Aims. We aim to…

Instrumentation and Methods for Astrophysics · Physics 2026-04-02 Igor Vaiman