Related papers: Estimation from Partially Sampled Distributed Trac…
Random sampling is an essential tool in the processing and transmission of data. It is used to summarize data too large to store or manipulate and meet resource constraints on bandwidth or battery power. Estimators that are applied to the…
Since only a small number of traces generated from distributed tracing helps in troubleshooting, its storage requirement can be significantly reduced by biasing the selection towards anomalous traces. To aid in this scenario, we propose…
The performance of a machine learning system is usually evaluated by using i.i.d.\ observations with true labels. However, acquiring ground truth labels is expensive, while obtaining unlabeled samples may be cheaper. Stratified sampling can…
The problem of estimation of the proportion of units with a given attribute in a~finite population is considered. From the population a sample is drawn due to the simple random sampling without replacement. There are limited funds for…
Distance queries are a basic tool in data analysis. They are used for detection and localization of change for the purpose of anomaly detection, monitoring, or planning. Distance queries are particularly useful when data sets such as…
A new approach of obtaining stratified random samples from statistically dependent random variables is described. The proposed method can be used to obtain samples from the input space of a computer forward model in estimating expectations…
We develop a new sampling method to estimate eigenvector centrality on incomplete networks. Our goal is to estimate this global centrality measure having at disposal a limited amount of data. This is the case in many real-world scenarios…
Sampling is an important tool for estimating large, complex sums and integrals over high dimensional spaces. For instance, important sampling has been used as an alternative to exact methods for inference in belief networks. Ideally, we…
Measurement samples are often taken in various monitoring applications. To reduce the sensing cost, it is desirable to achieve better sensing quality while using fewer samples. Compressive Sensing (CS) technique finds its role when the…
Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the…
In this paper, we propose a stratified sampling algorithm in which the random drawings made in the strata to compute the expectation of interest are also used to adaptively modify the proportion of further drawings in each stratum. These…
Data splitting divides data into two parts. One part is reserved for model selection. In some applications, the second part is used for model validation but we use this part for estimating the parameters of the chosen model. We focus on the…
Statistical inference is often simplified by sample-splitting. This simplification comes at the cost of the introduction of randomness not native to the data. We propose a simple procedure for sequentially aggregating statistics constructed…
Statistical samples, in order to be representative, have to be drawn from a population in a random and unbiased way. Nevertheless, it is common practice in the field of model-based diagnosis to make estimations from (biased) best-first…
Partially-observed data collected by sampling methods is often being studied to obtain the characteristics of information diffusion networks. However, these methods usually do not consider the behavior of diffusion process. In this paper,…
As the size of modern data sets exceeds the disk and memory capacities of a single computer, machine learning practitioners have resorted to parallel and distributed computing. Given that optimization is one of the pillars of machine…
In recent years, an increasing amount of data is collected in different and often, not cooperative, databases. The problem of privacy-preserving, distributed calculations over separated databases and, a relative to it, issue of private data…
The problem of covariance estimation for replicated surface-valued processes is examined from the functional data analysis perspective. Considerations of statistical and computational efficiency often compel the use of separability of the…
Network sampling is a crucial technique for analyzing large or partially observable networks. However, the effectiveness of different sampling methods can vary significantly depending on the context. In this study, we empirically compare…
Markov chain sampling methods that automatically adapt to characteristics of the distribution being sampled can be constructed by exploiting the principle that one can sample from a distribution by sampling uniformly from the region under…