English
Related papers

Related papers: Optimal Representative Sample Weighting

200 papers

Automated model selection is often proposed to users to choose which machine learning model (or method) to apply to a given regression task. In this paper, we show that combining different regression models can yield better results than…

Machine Learning · Computer Science 2022-06-24 Patrick Echtenbruck , Martina Echtenbruck , Joost Batenburg , Thomas Bäck , Boris Naujoks , Michael Emmerich

Sampling biases in training data are a major source of algorithmic biases in machine learning systems. Although there are many methods that attempt to mitigate such algorithmic biases during training, the most direct and obvious way is…

Machine Learning · Statistics 2022-04-15 Laura Niss , Yuekai Sun , Ambuj Tewari

In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to…

Computation · Statistics 2025-08-19 Matthew R. Williams , Terrance D. Savitsky

Optimization problems with the objective function in the form of weighted sum and linear equality constraints are considered. Given that the number of local cost functions can be large as well as the number of constraints, a stochastic…

Optimization and Control · Mathematics 2026-05-26 Nataša Krejić , Nataša Krklec Jerinkić , Sanja Rapajić , Luka Rutešić

Statistical samples, in order to be representative, have to be drawn from a population in a random and unbiased way. Nevertheless, it is common practice in the field of model-based diagnosis to make estimations from (biased) best-first…

Artificial Intelligence · Computer Science 2022-08-05 Patrick Rodler , Fatima Elichanova

We advocate an optimization procedure for variable density sampling in the context of compressed sensing. In this perspective, we introduce a minimization problem for the coherence between the sparsity and sensing bases, whose solution…

Information Theory · Computer Science 2011-09-29 Gilles Puy , Pierre Vandergheynst , Yves Wiaux

Randomized Controlled Trials (RCTs) may suffer from limited scope. In particular, samples may be unrepresentative: some RCTs over- or under- sample individuals with certain characteristics compared to the target population, for which one…

Methodology · Statistics 2024-03-15 Bénédicte Colnet , Julie Josse , Gaël Varoquaux , Erwan Scornet

In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]),…

Data Structures and Algorithms · Computer Science 2015-07-29 Pavlos S. Efraimidis

A basic principle in the design of observational studies is to approximate the randomized experiment that would have been conducted under controlled circumstances. Now, linear regression models are commonly used to analyze observational…

Methodology · Statistics 2022-07-08 Ambarish Chattopadhyay , Jose R. Zubizarreta

In an ordinary feature selection procedure, a set of important features is obtained by solving an optimization problem such as the Lasso regression problem, and we expect that the obtained features explain the data well. In this study,…

Machine Learning · Statistics 2018-10-16 Satoshi Hara , Takanori Maehara

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

In the social sciences, it is often necessary to debias studies and surveys before valid conclusions can be drawn. Debiasing algorithms enable the computational removal of bias using sample weights. However, an issue arises when only a…

Machine Learning · Computer Science 2026-03-03 Tony Hauptmann , Stefan Kramer

A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…

Methodology · Statistics 2015-11-24 Rong Zhu , Ping Ma , Michael W. Mahoney , Bin Yu

The amount of large-scale real data around us increase in size very quickly and so does the necessity to reduce its size by obtaining a representative sample. Such sample allows us to use a great variety of analytical methods, whose direct…

Social and Information Networks · Computer Science 2014-02-10 Milos Kudelka , Sarka Zehnalova , Jan Platos

We propose a simple method by which to choose sample weights for problems with highly imbalanced or skewed traits. Rather than naively discretizing regression labels to find binned weights, we take a more principled approach -- we derive…

Machine Learning · Computer Science 2021-04-01 Daniel J. Wu , Avoy Datta

In this paper, a novel method to adaptively approximate the solution to stochastic differential equations, which is based on compressive sampling and sparse recovery, is introduced. The proposed method consider the problem of sparse…

Numerical Analysis · Mathematics 2013-07-03 Behrooz Azarkhalili

Compressed Sensing refers to extracting a low-dimensional structured signal of interest from its incomplete random linear observations. A line of recent work has studied that, with the extra prior information about the signal, one can…

Information Theory · Computer Science 2017-04-19 Sajad Daei , Farzan Haddadi

We consider stochastic optimization problems which use observed data to estimate essential characteristics of the random quantities involved. Sample average approximation (SAA) or empirical (plug-in) estimation are very popular ways to use…

Statistics Theory · Mathematics 2021-03-16 Darinka Dentcheva , Yang Lin

We study the optimal sample complexity of variable selection in linear regression under general design covariance, and show that subset selection is optimal while under standard complexity assumptions, efficient algorithms for this problem…

Statistics Theory · Mathematics 2025-10-07 Ming Gao , Bryon Aragam

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…

Data Structures and Algorithms · Computer Science 2007-05-23 Mario Szegedy , Mikkel Thorup
‹ Prev 1 2 3 10 Next ›