English
Related papers

Related papers: Computing Extremely Accurate Quantiles Using t-Dig…

200 papers

Estimating the distribution and quantiles of data is a foundational task in data mining and data science. We study algorithms which provide accurate results for extreme quantile queries using a small amount of space, thus helping to…

Data Structures and Algorithms · Computer Science 2021-06-11 Graham Cormode , Abhinav Mishra , Joseph Ross , Pavel Veselý

Quantiles are very important statistics information used to describe the distribution of datasets. Given the quantiles of a dataset, we can easily know the distribution of the dataset, which is a fundamental problem in data analysis.…

Databases · Computer Science 2015-08-25 Zixuan Zhuang

As data volume grows extensively, data profiling helps to extract metadata of large-scale data. However, one kind of metadata, order statistics, is difficult to be computed because they are not mergeable or incremental. Thus, the limitation…

Data Structures and Algorithms · Computer Science 2020-06-29 Zhiwei Chen , Aoqian Zhang

A $t$-digest is a compact data structure that allows estimates of quantiles which increased accuracy near $q = 0$ or $q=1$. This is done by clustering samples from $\mathbb R$ subject to a constraint that the number of points associated…

Computation · Statistics 2019-03-26 Ted Dunning

A $t$-digest is a compact data structure that allows estimates of quantiles which increased accuracy near $q = 0$ or $q=1$. This is done by clustering samples from $\mathbb R$ subject to a constraint that the number of points associated…

Computation · Statistics 2019-03-26 Ted Dunning

The $t$-digest is a data structure that can be queried for approximate quantiles, with greater accuracy near the minimum and maximum of the distribution. We develop a $t$-digest variant with accuracy asymmetric about the median, thereby…

Data Structures and Algorithms · Computer Science 2020-05-20 Joseph Ross

Quantile regression is a method to estimate the quantiles of the conditional distribution of a response variable, and as such it permits a much more accurate portrayal of the relationship between the response variable and observed…

Data Structures and Algorithms · Computer Science 2014-01-08 Jiyan Yang , Xiangrui Meng , Michael W. Mahoney

We propose a new method for estimating the extreme quantiles for a function of several dependent random variables. In contrast to the conventional approach based on extreme value theory, we do not impose the condition that the tail of the…

Methodology · Statistics 2013-11-25 Jinguo Gong , Yadong Li , Liang Peng , Qiwei Yao

Clustering, or grouping, dataset elements based on similarity can be used not only to classify a dataset into a few categories, but also to approximate it by a relatively large number of representative elements. In the latter scenario,…

Machine Learning · Computer Science 2019-09-13 Tim Jaschek , Marko Bucyk , Jaspreet S. Oberoi

Quantile regression is an important tool for estimation of conditional quantiles of a response Y given a vector of covariates X. It can be used to measure the effect of covariates not only in the center of a distribution, but also in the…

Statistics Theory · Mathematics 2017-10-03 Victor Chernozhukov

We consider a novel challenge: approximating a distribution without the ability to randomly sample from that distribution. We study how such an approximation can be obtained using *weight queries*. Given some data set of examples, a weight…

Machine Learning · Computer Science 2021-07-15 Nadav Barak , Sivan Sabato

Space-efficient streaming estimation of quantiles in massive datasets is a fundamental problem with numerous applications in data monitoring and analysis. While theoretical research led to optimal algorithms, such as the Greenwald-Khanna…

Data Structures and Algorithms · Computer Science 2025-09-12 Aleksander Łukasiewicz , Jakub Tětek , Pavel Veselý

Many big-data clusters store data in large partitions that support access at a coarse, partition-level granularity. As a result, approximate query processing via row-level sampling is inefficient, often requiring reads of many partitions.…

Databases · Computer Science 2020-08-25 Kexin Rong , Yao Lu , Peter Bailis , Srikanth Kandula , Philip Levis

Very large datasets are often encountered in climatology, either from a multiplicity of observations over time and space or outputs from deterministic models (sometimes in petabytes= 1 million gigabytes). Loading a large data vector and…

Computation · Statistics 2010-07-08 Reza Hosseini

Finite precision approximations of discrete probability distributions are considered, applicable for distribution synthesis, e.g., probabilistic shaping. Two algorithms are presented that find the optimal $M$-type approximation $Q$ of a…

Information Theory · Computer Science 2017-05-08 Georg Böcherer , Bernhard C. Geiger

Percentiles and more generally, quantiles are commonly used in various contexts to summarize data. For most distributions, there is exactly one quantile that is unbiased. For distributions like the Gaussian that have the same mean and…

Methodology · Statistics 2022-01-11 Rohit Pandey

Computing the approximate quantiles or ranks of a stream is a fundamental task in data monitoring. Given a stream of elements $x_1, x_2, \dots, x_n$ and a query $x$, a relative-error quantile estimation algorithm can estimate the rank of…

Data Structures and Algorithms · Computer Science 2024-11-05 Elena Gribelyuk , Pachara Sawettamalya , Hongxun Wu , Huacheng Yu

Over the past a few years, research and development has made significant progresses on big data analytics. A fundamental issue for big data analytics is the efficiency. If the optimal solution is unable to attain or not required or has a…

Databases · Computer Science 2019-01-03 Shuai Ma , Jinpeng Huai

We propose a simple and efficient clustering method for high-dimensional data with a large number of clusters. Our algorithm achieves high-performance by evaluating distances of datapoints with a subset of the cluster centres. Our…

Machine Learning · Computer Science 2022-03-30 Georgios Exarchakis , Omar Oubari , Gregor Lenz

An algorithm for sampling exactly from the normal distribution is given. The algorithm reads some number of uniformly distributed random digits in a given base and generates an initial portion of the representation of a normal deviate in…

Computational Physics · Physics 2016-02-01 Charles F. F. Karney
‹ Prev 1 2 3 10 Next ›