Related papers: Bootstrap Model Aggregation for Distributed Statis…
Model averaging has gained significant attention in recent years due to its ability of fusing information from different models. The critical challenge in frequentist model averaging is the choice of weight vector. The bootstrap method,…
Estimating nonlinear functionals of probability distributions from samples is a fundamental statistical problem. The "plug-in" estimator obtained by applying the target functional to the empirical distribution of samples is biased.…
We consider a distributed estimation method in a setting with heterogeneous streams of correlated data distributed across nodes in a network. In the considered approach, linear models are estimated locally (i.e., with only local data)…
Statistical multispecies models of multiarea marine ecosystems use a variety of data sources to estimate parameters using composite or weighted likelihood functions with associated weighting issues and questions on how to obtain variance…
This article introduces an iterative distributed computing estimator for the multinomial logistic regression model with large choice sets. Compared to the maximum likelihood estimator, the proposed iterative distributed estimator achieves…
Accurate noise modelling is important for training of deep learning reconstruction algorithms. While noise models are well known for traditional imaging techniques, the noise distribution of a novel sensor may be difficult to determine a…
Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…
In stochastic simulation, input uncertainty refers to the output variability arising from the statistical noise in specifying the input models. This uncertainty can be measured by a variance contribution in the output, which, in the…
We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is…
We analyze gradient descent with randomly weighted data points in a linear regression model, under a generic weighting distribution. This includes various forms of stochastic gradient descent, importance sampling, but also extends to…
We present some new density estimation algorithms obtained by bootstrap aggregation like Bagging. Our algorithms are analyzed and empirically compared to other methods found in the statistical literature, like stacking and boosting for…
Despite decades of research and recent progress in adaptive control and reinforcement learning, there remains a fundamental lack of understanding in designing controllers that provide robustness to inherent non-asymptotic uncertainties…
Estimating the mixing density of a latent mixture model is an important task in signal processing. Nonparametric maximum likelihood estimation is one popular approach to this problem. If the latent variable distribution is assumed to be…
Generalized linear model or GLM constitutes a large class of models and essentially extends the ordinary linear regression by connecting the mean of the response variable with the covariate through appropriate link functions. On the other…
Distributed learning paradigms, such as federated and decentralized learning, allow for the coordination of models across a collection of agents, and without the need to exchange raw data. Instead, agents compute model updates locally based…
Accurate statistical inference in logistic regression models remains a critical challenge when the ratio between the number of parameters and sample size is not negligible. This is because approximations based on either classical asymptotic…
An approach to distributed machine learning is to train models on local datasets and aggregate these models into a single, stronger model. A popular instance of this form of parallelization is federated learning, where the nodes…
This paper studies the problem of estimation from relative measurements in a graph, in which a vector indexed over the nodes has to be reconstructed from pairwise measurements of differences between its components associated to nodes…
This paper introduces smoothed pseudo-population bootstrap methods for the purposes of variance estimation and the construction of confidence intervals for finite population quantiles. In an i.i.d. context, it has been shown that resampling…
Non-probability sampling, for example in the form of online panels, has become a fast and cheap method to collect data. While reliable inference tools are available for classical probability samples, non-probability samples can yield…