English
Related papers

Related papers: Non-uniform quantization with linear average-case …

200 papers

Conformal prediction constructs a set of labels instead of a single point prediction, while providing a probabilistic coverage guarantee. Beyond the coverage guarantee, adaptiveness to example difficulty is an important property. It means…

Machine Learning · Computer Science 2025-11-18 Sooyong Jang , Insup Lee

In the Bin Packing problem one is given $n$ items with weights $w_1,\ldots,w_n$ and $m$ bins with capacities $c_1,\ldots,c_m$. The goal is to find a partition of the items into sets $S_1,\ldots,S_m$ such that $w(S_j) \leq c_j$ for every bin…

Data Structures and Algorithms · Computer Science 2023-09-11 Jesper Nederlof , Jakub Pawlewicz , Céline M. F. Swennenhuis , Karol Węgrzycki

Local moments are used for local regression, to compute statistical measures such as sums, averages, and standard deviations, and to approximate probability distributions. We consider the case where the data source is a very large I/O array…

Data Structures and Algorithms · Computer Science 2020-04-28 Daniel Lemire , Owen Kaser

Binned scatter plots are a powerful statistical tool for empirical work in the social, behavioral, and biomedical sciences. Available methods rely on a quantile-based partitioning estimator of the conditional mean regression function to…

Methodology · Statistics 2024-07-23 Matias D. Cattaneo , Richard K. Crump , Max H. Farrell , Yingjie Feng

This paper describes a new median algorithm and a median approximation algorithm. The former has O(n) average running time and the latter has O(n) worst-case running time. These algorithms are highly competitive with the standard algorithm…

Computation · Statistics 2009-05-12 Ryan J. Tibshirani

Here we present a novel approach to statistical analysis of financial time series. The approach is based on $n$-grams frequency dictionaries derived from the quantized market data. Such dictionaries are studied by evaluating their…

Statistical Finance · Quantitative Finance 2013-08-14 Igor Borovikov , Michael Sadovsky

Binning (a.k.a. discretization) of numerically continuous measurements is a wide-spread but controversial practice in data collection, analysis, and presentation. The consequences of binning have been evaluated for many different kinds of…

Machine Learning · Computer Science 2022-02-25 Andrew Colt Deckert , Erich Kummerfeld

In binary classification, there are situations where negative (N) data are too diverse to be fully labeled and we often resort to positive-unlabeled (PU) learning in these scenarios. However, collecting a non-representative N set that…

Machine Learning · Computer Science 2019-07-16 Yu-Guan Hsieh , Gang Niu , Masashi Sugiyama

This work proposes a non-iterative strategy for missing value imputations which is guided by similarity between observations, but instead of explicitly determining distances or nearest neighbors, it assigns observations to overlapping…

Machine Learning · Statistics 2019-11-25 David Cortes

When reading peer-reviewed scientific literature describing any analysis of empirical data, it is natural and correct to proceed with the underlying assumption that experiments have made good faith efforts to ensure that their analyses…

Data Analysis, Statistics and Probability · Physics 2012-09-13 S. Towers

Data discretization, also known as binning, is a frequently used technique in computer science, statistics, and their applications to biological data analysis. We present a new method for the discretization of real-valued data into a finite…

Other Quantitative Biology · Quantitative Biology 2007-05-23 Elena S. Dimitrova , John J. McGee , Reinhard C. Laubenbacher

The method of random projections has become very popular for large-scale applications in statistical learning, information retrieval, bio-informatics and other applications. Using a well-designed coding scheme for the projected data, which…

Machine Learning · Computer Science 2013-08-12 Ping Li , Michael Mitzenmacher , Anshumali Shrivastava

Reliable density estimation is fundamental for numerous applications in statistics and machine learning. In many practical scenarios, data are best modeled as mixtures of component densities that capture complex and multimodal patterns.…

Machine Learning · Computer Science 2025-09-30 Mustafa Musab , Joseph K. Chege , Arie Yeredor , Martin Haardt

Despite significant progress in the caching literature concerning the worst case and uniform average case regimes, the algorithms for caching with nonuniform demands are still at a basic stage and mostly rely on simple grouping and…

Information Theory · Computer Science 2018-02-01 Pierre Quinton , Saeid Sahraei , Michael Gastpar

The bin packing problem is to find the minimum number of bins of size one to pack a list of items with sizes $a_1,..., a_n$ in $(0,1]$. Using uniform sampling, which selects a random element from the input list each time, we develop a…

Computational Complexity · Computer Science 2011-02-25 Richard Beigel , Bin Fu

This paper introduces a new type of probabilistic semiparametric model that takes advantage of data binning to reduce the computational cost of kernel density estimation in nonparametric distributions. Two new conditional probability…

Machine Learning · Computer Science 2026-04-02 Rafael Sojo , Javier Díaz-Rozo , Concha Bielza , Pedro Larrañaga

We study memory-efficient matrix factorization for differentially private counting under continual observation. While recent work by Henzinger and Upadhyay 2024 introduced a factorization method with reduced error based on group algebra,…

Data Structures and Algorithms · Computer Science 2025-04-08 Monika Henzinger , Nikita P. Kalinin , Jalaj Upadhyay

Mutual Information (MI) is a powerful statistical measure that quantifies shared information between random variables, particularly valuable in high-dimensional data analysis across fields like genomics, natural language processing, and…

Machine Learning · Computer Science 2024-12-02 Andre O. Falcao

We introduce a new kind of likelihood function based on the sequence of moments of the data distribution. Both binned and unbinned data samples are discussed, and the multivariate case is also derived. Building on this approach we lay out…

Data Analysis, Statistics and Probability · Physics 2015-03-09 Sylvain Fichet

Random binning is an efficient, yet complex, coding technique for the symmetric L-description source coding problem. We propose an alternative approach, that uses the quantized samples of a bandlimited source as "descriptions". By the…

Information Theory · Computer Science 2013-11-20 Adam Mashiach , Jan Ostergaard , Ram Zamir
‹ Prev 1 2 3 10 Next ›