Related papers: Estimation from Partially Sampled Distributed Trac…

Get the Most out of Your Sample: Optimal Unbiased Estimators using Partial Information

Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the…

Databases · Computer Science 2015-03-19 Edith Cohen , Haim Kaplan

SampleHST: Efficient On-the-Fly Selection of Distributed Traces

Since only a small number of traces generated from distributed tracing helps in troubleshooting, its storage requirement can be significantly reduced by biasing the selection towards anomalous traces. To aid in this scenario, we propose…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-11 Alim Ul Gias , Yicheng Gao , Matthew Sheldon , José A. Perusquía , Owen O'Brien , Giuliano Casale

Near Optimal Stratified Sampling

The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…

Machine Learning · Computer Science 2019-07-29 Tiancheng Yu , Xiyu Zhai , Suvrit Sra

Cost Issue in Estimation of Proportion in a Finite Population Divided Among Two Strata

The problem of estimation of the proportion of units with a given attribute in a~finite population is considered. From the population a sample is drawn due to the simple random sampling without replacement. There are limited funds for…

Statistics Theory · Mathematics 2019-03-26 Dominik Sieradzki , Wojciech Zieliński

Distance Queries from Sampled Data: Accurate and Efficient

Distance queries are a basic tool in data analysis. They are used for detection and localization of change for the purpose of anomaly detection, monitoring, or planning. Distance queries are particularly useful when data sets such as…

Data Structures and Algorithms · Computer Science 2015-03-20 Edith Cohen

Stratified Random Sampling for Dependent Inputs

A new approach of obtaining stratified random samples from statistically dependent random variables is described. The proposed method can be used to obtain samples from the input space of a computer forward model in estimating expectations…

Methodology · Statistics 2019-11-25 Anirban Mondal , Abhijit Mandal

Sampling on networks: estimating eigenvector centrality on incomplete graphs

We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goal is to estimate this global centrality measure having at disposal a limited amount of data. This is the case in many real-world scenarios…

Social and Information Networks · Computer Science 2020-10-29 Nicolò Ruggeri , Caterina De Bacco

Adaptive Importance Sampling for Estimation in Structured Domains

Sampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we…

Artificial Intelligence · Computer Science 2013-01-18 Luis E. Ortiz , Leslie Pack Kaelbling

Pushing towards the Limit of Sampling Rate: Adaptive Chasing Sampling

Measurement samples are often taken in various monitoring applications. To reduce the sensing cost, it is desirable to achieve better sensing quality while using fewer samples. Compressive Sensing (CS) technique finds its role when the…

Information Theory · Computer Science 2016-11-18 Ying Li , Kun Xie , Xin Wang

Split Regression Modeling

Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the…

Methodology · Statistics 2022-01-11 Anthony Christidis , Stefan Van Aelst , Ruben Zamar

Adaptive optimal allocation in stratified sampling methods

In this paper, we propose a stratified sampling algorithm in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of further drawings in each stratum. These…

Methodology · Statistics 2007-12-04 Pierre Etore , Benjamin Jourdain

Does Data Splitting Improve Prediction?

Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the…

Methodology · Statistics 2016-01-20 Julian J. Faraway

Reproducible Aggregation of Sample-Split Statistics

Statistical inference is often simplified by sample-splitting. This simplification comes at the cost of the introduction of randomness not native to the data. We propose a simple procedure for sequentially aggregating statistics constructed…

Econometrics · Economics 2024-11-18 David M. Ritzwoller , Joseph P. Romano

Do We Really Sample Right In Model-Based Diagnosis?

Statistical samples, in order to be representative, have to be drawn from a population in a random and unbiased way. Nevertheless, it is common practice in the field of model-based diagnosis to make estimations from (biased) best-first…

Artificial Intelligence · Computer Science 2022-08-05 Patrick Rodler , Fatima Elichanova

Diffusion-Aware Sampling and Estimation in Information Diffusion Networks

Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper,…

Social and Information Networks · Computer Science 2014-05-30 Motahareh Eslami Mehdiabadi , Hamid R. Rabiee , Mostafa Salehi

A Stochastic Large-scale Machine Learning Algorithm for Distributed Features and Observations

As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…

Machine Learning · Statistics 2019-12-10 Biyi Fang , Diego Klabjan

Efficient and Private Approximations of Distributed Databases Calculations

In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separated databases and, a relative to it, issue of private data…

Databases · Computer Science 2016-05-23 Philip Derbeko , Shlomi Dolev , Ehud Gudes , Jeffrey D. Ullman

Random Surface Covariance Estimation by Shifted Partial Tracing

The problem of covariance estimation for replicated surface-valued processes is examined from the functional data analysis perspective. Considerations of statistical and computational efficiency often compel the use of separability of the…

Methodology · Statistics 2021-10-25 Tomas Masak , Victor M. Panaretos

Network Sampling: An Overview and Comparative Analysis

Network sampling is a crucial technique for analyzing large or partially observable networks. However, the effectiveness of different sampling methods can vary significantly depending on the context. In this study, we empirically compare…

Social and Information Networks · Computer Science 2025-05-05 Quoc Chuong Nguyen

Slice Sampling

Markov chain sampling methods that automatically adapt to characteristics of the distribution being sampled can be constructed by exploiting the principle that one can sample from a distribution by sampling uniformly from the region under…

Data Analysis, Statistics and Probability · Physics 2007-05-23 Radford M. Neal