Related papers: Non-uniform quantization with linear average-case …

Quantifying and Improving Adaptivity in Conformal Prediction through Input Transformations

Conformal prediction constructs a set of labels instead of a single point prediction, while providing a probabilistic coverage guarantee. Beyond the coverage guarantee, adaptiveness to example difficulty is an important property. It means…

Machine Learning · Computer Science 2025-11-18 Sooyong Jang , Insup Lee

A Faster Exponential Time Algorithm for Bin Packing With a Constant Number of Bins via Additive Combinatorics

In the Bin Packing problem one is given $n$ items with weights $w_1,\ldots,w_n$ and $m$ bins with capacities $c_1,\ldots,c_m$. The goal is to find a partition of the items into sets $S_1,\ldots,S_m$ such that $w(S_j) \leq c_j$ for every bin…

Data Structures and Algorithms · Computer Science 2023-09-11 Jesper Nederlof , Jakub Pawlewicz , Céline M. F. Swennenhuis , Karol Węgrzycki

Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array…

Data Structures and Algorithms · Computer Science 2020-04-28 Daniel Lemire , Owen Kaser

Nonlinear Binscatter Methods

Binned scatter plots are a powerful statistical tool for empirical work in the social, behavioral, and biomedical sciences. Available methods rely on a quantile-based partitioning estimator of the conditional mean regression function to…

Methodology · Statistics 2024-07-23 Matias D. Cattaneo , Richard K. Crump , Max H. Farrell , Yingjie Feng

Fast computation of the median by successive binning

This paper describes a new median algorithm and a median approximation algorithm. The former has O(n) average running time and the latter has O(n) worst-case running time. These algorithms are highly competitive with the standard algorithm…

Computation · Statistics 2009-05-12 Ryan J. Tibshirani

A relative information approach to financial time series analysis using binary $N$-grams dictionaries

Here we present a novel approach to statistical analysis of financial time series. The approach is based on $n$-grams frequency dictionaries derived from the quantized market data. Such dictionaries are studied by evaluating their…

Statistical Finance · Quantitative Finance 2013-08-14 Igor Borovikov , Michael Sadovsky

Investigating the effect of binning on causal discovery

Binning (a.k.a. discretization) of numerically continuous measurements is a wide-spread but controversial practice in data collection, analysis, and presentation. The consequences of binning have been evaluated for many different kinds of…

Machine Learning · Computer Science 2022-02-25 Andrew Colt Deckert , Erich Kummerfeld

Classification from Positive, Unlabeled and Biased Negative Data

In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that…

Machine Learning · Computer Science 2019-07-16 Yu-Guan Hsieh , Gang Niu , Masashi Sugiyama

Imputing missing values with unsupervised random trees

This work proposes a non-iterative strategy for missing value imputations which is guided by similarity between observations, but instead of explicitly determining distances or nearest neighbors, it assigns observations to overlapping…

Machine Learning · Statistics 2019-11-25 David Cortes

Potential fitting biases resulting from grouping data into variable width bins

When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses…

Data Analysis, Statistics and Probability · Physics 2012-09-13 S. Towers

Discretization of Time Series Data

Data discretization, also known as binning, is a frequently used technique in computer science, statistics, and their applications to biological data analysis. We present a new method for the discretization of real-valued data into a finite…

Other Quantitative Biology · Quantitative Biology 2007-05-23 Elena S. Dimitrova , John J. McGee , Reinhard C. Laubenbacher

Coding for Random Projections

The method of random projections has become very popular for large-scale applications in statistical learning, information retrieval, bio-informatics and other applications. Using a well-designed coding scheme for the projected data, which…

Machine Learning · Computer Science 2013-08-12 Ping Li , Michael Mitzenmacher , Anshumali Shrivastava

A Unified MDL-based Binning and Tensor Factorization Framework for PDF Estimation

Reliable density estimation is fundamental for numerous applications in statistics and machine learning. In many practical scenarios, data are best modeled as mixtures of component densities that capture complex and multimodal patterns.…

Machine Learning · Computer Science 2025-09-30 Mustafa Musab , Joseph K. Chege , Arie Yeredor , Martin Haardt

A Novel Centralized Strategy for Coded Caching with Non-uniform Demands

Despite significant progress in the caching literature concerning the worst case and uniform average case regimes, the algorithms for caching with nonuniform demands are still at a basic stage and mostly rely on simple grouping and…

Information Theory · Computer Science 2018-02-01 Pierre Quinton , Saeid Sahraei , Michael Gastpar

A Dense Hierarchy of Sublinear Time Approximation Schemes for Bin Packing

The bin packing problem is to find the minimum number of bins of size one to pack a list of items with sizes $a_1,..., a_n$ in $(0,1]$. Using uniform sampling, which selects a random element from the input list each time, we develop a…

Computational Complexity · Computer Science 2011-02-25 Richard Beigel , Bin Fu

Binned semiparametric Bayesian networks for efficient kernel density estimation

This paper introduces a new type of probabilistic semiparametric model that takes advantage of data binning to reduce the computational cost of kernel density estimation in nonparametric distributions. Two new conditional probability…

Machine Learning · Computer Science 2026-04-02 Rafael Sojo , Javier Díaz-Rozo , Concha Bielza , Pedro Larrañaga

Binned Group Algebra Factorization for Differentially Private Continual Counting

We study memory-efficient matrix factorization for differentially private counting under continual observation. While recent work by Henzinger and Upadhyay 2024 introduced a factorization method with reduced error based on group algebra,…

Data Structures and Algorithms · Computer Science 2025-04-08 Monika Henzinger , Nikita P. Kalinin , Jalaj Upadhyay

Fast Mutual Information Computation for Large Binary Datasets

Mutual Information (MI) is a powerful statistical measure that quantifies shared information between random variables, particularly valuable in high-dimensional data analysis across fields like genomics, natural language processing, and…

Machine Learning · Computer Science 2024-12-02 Andre O. Falcao

New likelihoods for shape analysis

We introduce a new kind of likelihood function based on the sequence of moments of the data distribution. Both binned and unbinned data samples are discussed, and the multivariate case is also derived. Building on this approach we lay out…

Data Analysis, Statistics and Probability · Physics 2015-03-09 Sylvain Fichet

Sampling versus Random Binning for Multiple Descriptions of a Bandlimited Source

Random binning is an efficient, yet complex, coding technique for the symmetric L-description source coding problem. We propose an alternative approach, that uses the quantized samples of a bandlimited source as "descriptions". By the…

Information Theory · Computer Science 2013-11-20 Adam Mashiach , Jan Ostergaard , Ram Zamir