English
Related papers

Related papers: Coordinated Weighted Sampling for Estimating Aggre…

200 papers

{\em Multi-objective samples} are powerful and versatile summaries of large data sets. For a set of keys $x\in X$ and associated values $f_x \geq 0$, a weighted sample taken with respect to $f$ allows us to approximate {\em segment-sum…

Databases · Computer Science 2017-06-14 Edith Cohen

In this paper we explore several approaches for sampling weight vectors in the context of weighted sum scalarisation approaches for solving multi-criteria decision making (MCDM) problems. This established method converts a multi-objective…

Optimization and Control · Mathematics 2025-04-18 Aled Williams , Yilun Cai

Clustering ensemble, or consensus clustering, has emerged as a powerful tool for improving both the robustness and the stability of results from individual clustering methods. Weighted clustering ensemble arises naturally from clustering…

Computer Vision and Pattern Recognition · Computer Science 2021-12-14 Mimi Zhang

Joining records with all other records that meet a linkage condition can result in an astronomically large number of combinations due to many-to-many relationships. For such challenging (acyclic) joins, a random sample over the join result…

Databases · Computer Science 2022-01-11 Michael Shekelyan , Graham Cormode , Peter Triantafillou , Ali Shanghooshabad , Qingzhi Ma

For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…

Data Structures and Algorithms · Computer Science 2007-05-23 Mario Szegedy , Mikkel Thorup

In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to…

Computation · Statistics 2025-08-19 Matthew R. Williams , Terrance D. Savitsky

Many datasets such as market basket data, text or hypertext documents, and sensor observations recorded in different locations or time periods, are modeled as a collection of sets over a ground set of keys. We are interested in basic…

Databases · Computer Science 2009-03-05 Edith Cohen , Haim Kaplan

The integration of data from multiple sources is increasingly used to achieve larger sample sizes and enhance population diversity. Our previous work established that, under random sampling from the same underlying population, integrating…

Methodology · Statistics 2026-01-01 Farimah Shamsi , Andriy Derkach

Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…

Data Structures and Algorithms · Computer Science 2007-05-23 Nick Duffield , Carsten Lund , Mikkel Thorup

Conformal prediction quantifies the uncertainty of machine learning models by augmenting point predictions with valid prediction sets. For complex scenarios involving multiple trials, models, or data sources, conformal prediction sets can…

Machine Learning · Computer Science 2025-12-25 Gina Wong , Drew Prinster , Suchi Saria , Rama Chellappa , Anqi Liu

In complex survey data, each sampled observation has assigned a sampling weight, indicating the number of units that it represents in the population. Whether sampling weights should or not be considered in the estimation process of model…

Methodology · Statistics 2024-09-20 Amaia Iparragirre , Irantzu Barrio , Jorge Aramendi , Inmaculada Arostegui

In recent years, network embedding methods have garnered increasing attention because of their effectiveness in various information retrieval tasks. The goal is to learn low-dimensional representations of vertexes in an information network…

Social and Information Networks · Computer Science 2017-11-02 Chih-Ming Chen , Yi-Hsuan Yang , Yian Chen , Ming-Feng Tsai

Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer…

Machine Learning · Statistics 2025-09-01 Peiqi Zhao , Carlos E. Rodríguez , Ramsés H. Mena , Stephen G. Walker

In the training of large deep neural networks, there is a need for vast amounts of training data. To meet this need, data is collected from multiple domains, such as Wikipedia and GitHub. These domains are heterogeneous in both data quality…

Machine Learning · Computer Science 2025-11-11 Mahdi Salmani , Pratik Worah , Meisam Razaviyayn , Vahab Mirrokni

The importance of exploring a potential integration among surveys has been acknowledged in order to enhance effectiveness and minimize expenses. In this work, we employ the alignment method to combine information from two different surveys…

Methodology · Statistics 2024-04-09 Vasilis Chasiotis , Dimitris Karlis

Fitting mixed models to complex survey data is a challenging problem. Most methods in the literature, including the most widely used one, require a close relationship between the model structure and the survey design. In this paper we…

Methodology · Statistics 2023-11-23 Thomas Lumley , Xudong Huang

Region sampling or weighting is significantly important to the success of modern region-based object detectors. Unlike some previous works, which only focus on "hard" samples when optimizing the objective function, we argue that sample…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Qi Cai , Yingwei Pan , Yu Wang , Jingen Liu , Ting Yao , Tao Mei

Given multiple source word embeddings learnt using diverse algorithms and lexical resources, meta word embedding learning methods attempt to learn more accurate and wide-coverage word embeddings. Prior work on meta-embedding has repeatedly…

Computation and Language · Computer Science 2022-04-27 Danushka Bollegala

We propose a general approach to construct weighted likelihood estimating equations with the aim of obtain robust estimates. The weight, attached to each score contribution, is evaluated by comparing the statistical data depth at the model…

Methodology · Statistics 2018-02-16 Claudio Agostinelli

In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]),…

Data Structures and Algorithms · Computer Science 2015-07-29 Pavlos S. Efraimidis
‹ Prev 1 2 3 10 Next ›