Related papers: Coordinated Weighted Sampling for Estimating Aggre…
{\em Multi-objective samples} are powerful and versatile summaries of large data sets. For a set of keys $x\in X$ and associated values $f_x \geq 0$, a weighted sample taken with respect to $f$ allows us to approximate {\em segment-sum…
In this paper we explore several approaches for sampling weight vectors in the context of weighted sum scalarisation approaches for solving multi-criteria decision making (MCDM) problems. This established method converts a multi-objective…
Clustering ensemble, or consensus clustering, has emerged as a powerful tool for improving both the robustness and the stability of results from individual clustering methods. Weighted clustering ensemble arises naturally from clustering…
Joining records with all other records that meet a linkage condition can result in an astronomically large number of combinations due to many-to-many relationships. For such challenging (acyclic) joins, a random sample over the join result…
For high volume data streams and large data warehouses, sampling is used for efficient approximate answers to aggregate queries over selected subsets. Mathematically, we are dealing with a set of weighted items and want to support queries…
In the analysis of survey data, sampling weights are needed for consistent estimation of the population. However, the original inverse probability weights from the survey sample design are typically modified to account for non-response, to…
Many datasets such as market basket data, text or hypertext documents, and sensor observations recorded in different locations or time periods, are modeled as a collection of sets over a ground set of keys. We are interested in basic…
The integration of data from multiple sources is increasingly used to achieve larger sample sizes and enhance population diversity. Our previous work established that, under random sampling from the same underlying population, integrating…
Starting with a set of weighted items, we want to create a generic sample of a certain size that we can later use to estimate the total weight of arbitrary subsets. For this purpose, we propose priority sampling which tested on Internet…
Conformal prediction quantifies the uncertainty of machine learning models by augmenting point predictions with valid prediction sets. For complex scenarios involving multiple trials, models, or data sources, conformal prediction sets can…
In complex survey data, each sampled observation has assigned a sampling weight, indicating the number of units that it represents in the population. Whether sampling weights should or not be considered in the estimation process of model…
In recent years, network embedding methods have garnered increasing attention because of their effectiveness in various information retrieval tasks. The goal is to learn low-dimensional representations of vertexes in an information network…
Support points summarize a large dataset through a smaller set of representative points that can be used for data operations, such as Monte Carlo integration, without requiring access to the full dataset. In this sense, support points offer…
In the training of large deep neural networks, there is a need for vast amounts of training data. To meet this need, data is collected from multiple domains, such as Wikipedia and GitHub. These domains are heterogeneous in both data quality…
The importance of exploring a potential integration among surveys has been acknowledged in order to enhance effectiveness and minimize expenses. In this work, we employ the alignment method to combine information from two different surveys…
Fitting mixed models to complex survey data is a challenging problem. Most methods in the literature, including the most widely used one, require a close relationship between the model structure and the survey design. In this paper we…
Region sampling or weighting is significantly important to the success of modern region-based object detectors. Unlike some previous works, which only focus on "hard" samples when optimizing the objective function, we argue that sample…
Given multiple source word embeddings learnt using diverse algorithms and lexical resources, meta word embedding learning methods attempt to learn more accurate and wide-coverage word embeddings. Prior work on meta-embedding has repeatedly…
We propose a general approach to construct weighted likelihood estimating equations with the aim of obtain robust estimates. The weight, attached to each score contribution, is evaluated by comparing the statistical data depth at the model…
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case ([2, 4]),…