English
Related papers

Related papers: Weighted Sampling Without Replacement from Data St…

200 papers

In this work, we present a new random sampling method for data streams where the probability of an element's inclusion in the sample is proportional to a weight associated with that element. Our method is based on sampling with replacement,…

Data Structures and Algorithms · Computer Science 2026-03-18 Adriano Meligrana , Adriano Fazzone

In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]),…

Data Structures and Algorithms · Computer Science 2015-07-29 Pavlos S. Efraimidis

Graph Sampling provides an efficient yet inexpensive solution for analyzing large graphs. While extracting small representative subgraphs from large graphs, the challenge is to capture the properties of the original graph. Several sampling…

Data Structures and Algorithms · Computer Science 2019-10-21 Muhammad Irfan Yousuf , Raheel Anwar

This paper presents a novel algorithm solving the classic problem of generating a random sample of size s from population of size n with non-uniform probabilities. The sampling is done with replacement. The algorithm requires constant…

Data Structures and Algorithms · Computer Science 2016-11-03 Michał Startek

Weighted sampling is a fundamental tool in data analysis and machine learning pipelines. Samples are used for efficient estimation of statistics or as sparse representations of the data. When weight distributions are skewed, as is often the…

Machine Learning · Computer Science 2020-08-18 Edith Cohen , Rasmus Pagh , David P. Woodruff

As the popularity of graph data increases, there is a growing need to count the occurrences of subgraph patterns of interest, for a variety of applications. Many graphs are massive in scale and also fully dynamic (with insertions and…

Databases · Computer Science 2022-11-15 Kaixin Wang , Cheng Long , Da Yan , Jie Zhang , H. V. Jagadish

We consider message-efficient continuous random sampling from a distributed stream, where the probability of inclusion of an item in the sample is proportional to a weight associated with the item. The unweighted version, where all weights…

Data Structures and Algorithms · Computer Science 2019-04-09 Rajesh Jayaram , Gokarna Sharma , Srikanta Tirthapura , David P. Woodruff

A technique introduced by Indyk and Woodruff [STOC 2005] has inspired several recent advances in data-stream algorithms. We show that a number of these results follow easily from the application of a single probabilistic method called…

Data Structures and Algorithms · Computer Science 2011-04-26 Alexandr Andoni , Robert Krauthgamer , Krzysztof Onak

Most computational models of dependency syntax consist of distributions over spanning trees. However, the majority of dependency treebanks require that every valid dependency tree has a single edge coming out of the ROOT node, a constraint…

Computation and Language · Computer Science 2022-11-29 Miloš Stanojević

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

Sampling is a fundamental technique, and sampling without replacement is often desirable when duplicate samples are not beneficial. Within machine learning, sampling is useful for generating diverse outputs from a trained model. We present…

Machine Learning · Computer Science 2021-07-21 Kensen Shi , David Bieber , Charles Sutton

From a high volume stream of weighted items, we want to maintain a generic sample of a certain limited size $k$ that we can later use to estimate the total weight of arbitrary subsets. This is the classic context of on-line reservoir…

Data Structures and Algorithms · Computer Science 2010-11-16 Edith Cohen , Nick Duffield , Haim Kaplan , Carsten Lund , Mikkel Thorup

Consider the fundamental problem of drawing a simple random sample of size k without replacement from [n] := {1, . . . , n}. Although a number of classical algorithms exist for this problem, we construct algorithms that are even simpler,…

Data Structures and Algorithms · Computer Science 2021-04-13 Daniel Ting

We present the first feasible method for sampling a dynamic data stream with deletions, where the sample consists of pairs $(k,C_k)$ of a value $k$ and its exact total count $C_k$. Our algorithms are for both Strict Turnstile data streams…

Data Structures and Algorithms · Computer Science 2012-09-26 Neta Barkay , Ely Porat , Bar Shalem

Suppose an $n \times d$ design matrix in a linear regression problem is given, but the response for each point is hidden unless explicitly requested. The goal is to sample only a small number $k \ll n$ of the responses, and then produce a…

Machine Learning · Computer Science 2018-09-06 Michał Dereziński , Manfred K. Warmuth , Daniel Hsu

We analyze the convergence rates of stochastic gradient algorithms for smooth finite-sum minimax optimization and show that, for many such algorithms, sampling the data points without replacement leads to faster convergence compared to…

Optimization and Control · Mathematics 2022-10-11 Aniket Das , Bernhard Schölkopf , Michael Muehlebach

We consider communication-efficient weighted and unweighted (uniform) random sampling from distributed data streams presented as a sequence of mini-batches of items. This is a natural model for distributed streaming computation, and our…

Data Structures and Algorithms · Computer Science 2020-02-26 Lorenz Hübschle-Schneider , Peter Sanders

To tackle massive data, subsampling is a practical approach to select the more informative data points. However, when responses are expensive to measure, developing efficient subsampling schemes is challenging, and an optimal sampling…

Computation · Statistics 2022-10-11 Jing Wang , HaiYing Wang , Shifeng Xiong

Sampling without replacement is a natural online rounding strategy for converting fractional bipartite matching into an integral one. In Online Bipartite Matching, we can use the Balance algorithm to fractionally match each online vertex,…

Data Structures and Algorithms · Computer Science 2024-10-10 Zhiyi Huang , Chui Shan Lee , Jianqiao Lu , Xinkai Shu

The combined algorithm selection and hyperparameter tuning (CASH) problem is characterized by large hierarchical hyperparameter spaces. Model-free hyperparameter tuning methods can explore such large spaces efficiently since they are highly…

Machine Learning · Computer Science 2019-11-22 Dimitrios Sarigiannis , Thomas Parnell , Haris Pozidis
‹ Prev 1 2 3 10 Next ›