Related papers: Subset Sampling and Its Extensions
We study the fundamental problem of sampling independent events, called subset sampling. Specifically, consider a set of $n$ events $S=\{x_1, \ldots, x_n\}$, where each event $x_i$ has an associated probability $p(x_i)$. The subset sampling…
We study the selection problem, namely that of computing the $i$th order statistic of $n$ given elements. Here we offer a data structure called \emph{selectable sloppy heap} handling a dynamic version in which upon request: (i)~a new…
This paper addresses the Poisson $\pi$ps sampling problem, a topic of significant academic interest in various domains and with practical data mining applications, such as influence maximization. The problem includes a set $\mathcal{S}$ of…
Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down…
Running machine learning algorithms on large and rapidly growing volumes of data is often computationally expensive, one common trick to reduce the size of a data set, and thus reduce the computational cost of machine learning algorithms,…
The note studies the problem of selecting a good enough subset out of a finite number of alternatives under a fixed simulation budget. Our work aims to maximize the posterior probability of correctly selecting a good subset. We formulate…
The subset sum problem (SSP) can be briefly stated as: given a target integer $E$ and a set $A$ containing $n$ positive integer $a_j$, find a subset of $A$ summing to $E$. The \textit{density} $d$ of an SSP instance is defined by the ratio…
The Subset Sum Problem is a fundamental NP-complete problem in cryptography and combinatorial optimization, with many real-world applications. The Random Subset Sum Problem (RSSP) is a more applicable version of subset sum, where numbers…
In this paper, we study the Dynamic Parameterized Subset Sampling (DPSS) problem in the Word RAM model. In DPSS, the input is a set,~$S$, of~$n$ items, where each item,~$x$, has a non-negative integer weight,~$w(x)$. Given a pair of query…
We consider the problem of sampling $n$ numbers from the range $\{1,\ldots,N\}$ without replacement on modern architectures. The main result is a simple divide-and-conquer scheme that makes sequential algorithms more cache efficient and…
We study a ranking and selection (R&S) problem when all solutions share common parametric Bayesian input models updated with the data collected from multiple independent data-generating sources. Our objective is to identify the best system…
Suppose we have a memory storing $0$s and $1$s and we want to estimate the frequency of $1$s by sampling. We want to do this I/O-efficiently, exploiting that each read gives a block of $B$ bits at unit cost; not just one bit. If the input…
In the range $\alpha$-majority query problem, we are given a sequence $S[1..n]$ and a fixed threshold $\alpha \in (0, 1)$, and are asked to preprocess $S$ such that, given a query range $[i..j]$, we can efficiently report the symbols that…
Sampling is a fundamental problem in computer science and statistics. However, for a given task and stream, it is often not possible to choose good sampling probabilities in advance. We derive a general framework for adaptively changing the…
In recent years, the problem of computing the frequencies of the induced $k$-vertex subgraphs of a graph, or \emph{$k$-graphlets}, has become central. One approach for this problem is to sample $k$-graphlets randomly. Classic algorithms for…
Sample selection improves the efficiency and effectiveness of machine learning models by providing informative and representative samples. Typically, samples can be modeled as a sample graph, where nodes are samples and edges represent…
Consistent sampling is a technique for specifying, in small space, a subset $S$ of a potentially large universe $U$ such that the elements in $S$ satisfy a suitably chosen sampling condition. Given a subset $\mathcal{I}\subseteq U$ it…
We revisit the optimization from samples (OPS) model, which studies the problem of optimizing objective functions directly from the sample data. Previous results showed that we cannot obtain a constant approximation ratio for the maximum…
We consider the problem of storing a dynamic string $S$ over an alphabet $\Sigma=\{\,1,\ldots,\sigma\,\}$ in compressed form. Our representation supports insertions and deletions of symbols and answers three fundamental queries:…
This paper presents a novel algorithm solving the classic problem of generating a random sample of size s from population of size n with non-uniform probabilities. The sampling is done with replacement. The algorithm requires constant…