Related papers: Sampling with Costs

Sorting and Selection with Random Costs

There is a growing body of work on sorting and selection in models other than the unit-cost comparison model. This work is the first treatment of a natural stochastic variant of the problem where the cost of comparing two elements is a…

Data Structures and Algorithms · Computer Science 2007-10-02 Stanislav Angelov , Keshav Kunal , Andrew McGregor

Solvable Integration Problems and Optimal Sample Size Selection

We compute the integral of a function or the expectation of a random variable with minimal cost and use, for our new algorithm and for upper bounds of the complexity, i.i.d. samples. Under certain assumptions it is possible to select a…

Numerical Analysis · Mathematics 2018-10-24 Robert J. Kunsch , Erich Novak , Daniel Rudolf

Learning to Sample: Counting with Complex Queries

We study the problem of efficiently estimating counts for queries involving complex filters, such as user-defined functions, or predicates involving self-joins and correlated subqueries. For such queries, traditional sampling techniques may…

Databases · Computer Science 2020-01-01 Brett Walenz , Stavros Sintos , Sudeepa Roy , Jun Yang

Importance Sampling: Intrinsic Dimension and Computational Cost

The basic idea of importance sampling is to use independent samples from a proposal measure in order to approximate expectations with respect to a target measure. It is key to understand how many samples are required in order to guarantee…

Computation · Statistics 2017-01-17 S. Agapiou , O. Papaspiliopoulos , D. Sanz-Alonso , A. M. Stuart

Sampling to estimate arbitrary subset sums

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

Do We Really Sample Right In Model-Based Diagnosis?

Statistical samples, in order to be representative, have to be drawn from a population in a random and unbiased way. Nevertheless, it is common practice in the field of model-based diagnosis to make estimations from (biased) best-first…

Artificial Intelligence · Computer Science 2022-08-05 Patrick Rodler , Fatima Elichanova

Cost-sensitive Selection of Variables by Ensemble of Model Sequences

Many applications require the collection of data on different variables or measurements over many system performance metrics. We term those broadly as measures or variables. Often data collection along each measure incurs a cost, thus it is…

Methodology · Statistics 2021-11-30 Donghui Yan , Zhiwei Qin , Songxiang Gu , Haiping Xu , Ming Shao

Cheaper and Better: Selecting Good Workers for Crowdsourcing

Crowdsourcing provides a popular paradigm for data collection at scale. We study the problem of selecting subsets of workers from a given worker pool to maximize the accuracy under a budget constraint. One natural question is whether we…

Machine Learning · Statistics 2015-02-04 Hongwei Li , Qiang Liu

A method to find an efficient and robust sampling strategy under model uncertainty

We consider the problem of deciding on sampling strategy, in particular sampling design. We propose a risk measure, whose minimizing value guides the choice. The method makes use of a superpopulation model and takes into account uncertainty…

Methodology · Statistics 2020-07-06 Edgar Bueno , Dan Hedlin

Buying Data Over Time: Approximately Optimal Strategies for Dynamic Data-Driven Decisions

We consider a model where an agent has a repeated decision to make and wishes to maximize their total payoff. Payoffs are influenced by an action taken by the agent, but also an unknown state of the world that evolves over time. Before…

Computer Science and Game Theory · Computer Science 2021-01-20 Nicole Immorlica , Ian Kash , Brendan Lucier

Partial Resampling of Imbalanced Data

Imbalanced data is a frequently encountered problem in machine learning. Despite a vast amount of literature on sampling techniques for imbalanced data, there is a limited number of studies that address the issue of the optimal sampling…

Machine Learning · Computer Science 2022-07-12 Firuz Kamalov , Amir F. Atiya , Dina Elreedy

Optimal Sampling Gaps for Adaptive Submodular Maximization

Running machine learning algorithms on large and rapidly growing volumes of data is often computationally expensive, one common trick to reduce the size of a data set, and thus reduce the computational cost of machine learning algorithms,…

Machine Learning · Computer Science 2022-01-25 Shaojie Tang , Jing Yuan

Pool samples to efficiently estimate pathogen prevalence dynamics

Estimating the prevalence of a disease is necessary for evaluating and mitigating risks of its transmission within or between populations. Estimates that consider how prevalence changes with time provide more information about these risks…

Applications · Statistics 2021-11-12 Braden Scherting , Alison Peel , Raina Plowright , Andrew Hoegh

On the variance of subset sum estimation

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…

Data Structures and Algorithms · Computer Science 2007-05-23 Mario Szegedy , Mikkel Thorup

Batch mode active learning for efficient parameter estimation

For many tasks of data analysis, we may only have the information of the explanatory variable and the evaluation of the response values are quite expensive. While it is impractical or too costly to obtain the responses of all units, a…

Computation · Statistics 2023-04-07 Wei Zheng , Ting Tian , Xueqin Wang

Efficient Sampling Policy for Selecting a Good Enough Subset

The note studies the problem of selecting a good enough subset out of a finite number of alternatives under a fixed simulation budget. Our work aims to maximize the posterior probability of correctly selecting a good subset. We formulate…

Optimization and Control · Mathematics 2023-05-09 Gongbo Zhang , Bin Chen , Qing-shan Jia , Yijie Peng

Random Costs in Combinatorial Optimization

The random cost problem is the problem of finding the minimum in an exponentially long list of random numbers. By definition, this problem cannot be solved faster than by exhaustive search. It is shown that a classical NP-hard optimization…

Disordered Systems and Neural Networks · Physics 2009-10-31 Stephan Mertens

How to sample and when to stop sampling: The generalized Wald problem and minimax policies

We study sequential experiments where sampling is costly and a decision-maker aims to determine the best treatment for full scale implementation by (1) adaptively allocating units between two possible treatments, and (2) stopping the…

Econometrics · Economics 2025-05-06 Karun Adusumilli

A Comparison of 10 Sampling Algorithms for Configurable Systems

Almost every software system provides configuration options to tailor the system to the target platform and application scenario. Often, this configurability renders the analysis of every individual system configuration infeasible. To…

Software Engineering · Computer Science 2016-02-17 Flávio Medeiros , Christian Kästner , Márcio Ribeiro , Rohit Gheyi , Sven Apel

The need for adequate sampling in a well-functioning market surveillance system

Adequate sampling is essential for the well-functioning of a market surveillance system. As small as possible statistically significant sample size is the main factor that determines the costs of market surveillance actions. This paper…

Applications · Statistics 2019-11-06 Ivan Hendrikx , Nikola Tuneski