Related papers: Iterative Distributed Multinomial Regression

Simultaneous Inference for Massive Data: Distributed Bootstrap

In this paper, we propose a bootstrap method applied to massive data processed distributedly in a large number of machines. This new method is computationally efficient in that we bootstrap on the master machine without over-resampling,…

Machine Learning · Statistics 2020-02-21 Yang Yu , Shih-Kang Chao , Guang Cheng

Distributed Bootstrap for Simultaneous Inference Under High Dimensionality

We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines. The method produces an $\ell_\infty$-norm confidence region based on a…

Methodology · Statistics 2022-06-15 Yang Yu , Shih-Kang Chao , Guang Cheng

Distributed Statistical Inference for Massive Data

This paper considers distributed statistical inference for general symmetric statistics %that encompasses the U-statistics and the M-estimators in the context of massive data where the data can be stored at multiple platforms in different…

Statistics Theory · Mathematics 2018-05-30 Song Xi Chen , Liuhua Peng

Bootstrap Model Aggregation for Distributed Statistical Learning

In distributed, or privacy-preserving learning, we are often given a set of probabilistic models estimated from different local repositories, and asked to combine them into a single model that gives efficient statistical estimation. A…

Machine Learning · Statistics 2017-03-01 Jun Han , Qiang Liu

Logistic regression models for aggregated data

Logistic regression models are a popular and effective method to predict the probability of categorical response data. However inference for these models can become computationally prohibitive for large datasets. Here we adapt ideas from…

Methodology · Statistics 2020-08-25 Tom Whitaker , Boris Beranger , Scott A. Sisson

Robust and Sparse Multinomial Regression in High Dimensions

A robust and sparse estimator for multinomial regression is proposed for high dimensional data. Robustness of the estimator is achieved by trimming the observations, and sparsity of the estimator is obtained by the elastic net penalty,…

Methodology · Statistics 2022-05-25 Fatma Sevinç Kurnaz , Peter Filzmoser

Distributed Semi-Supervised Sparse Statistical Inference

The debiased estimator is a crucial tool in statistical inference for high-dimensional model parameters. However, constructing such an estimator involves estimating the high-dimensional inverse Hessian matrix, incurring significant…

Machine Learning · Statistics 2023-12-18 Jiyuan Tu , Weidong Liu , Xiaojun Mao , Mingyue Xu

More Efficient Estimation for Logistic Regression with Optimal Subsample

In this paper, we propose improved estimation method for logistic regression based on subsamples taken according the optimal subsampling probabilities developed in Wang et al. 2018 Both asymptotic results and numerical results show that the…

Methodology · Statistics 2021-06-24 HaiYing Wang

Functional Regression with Intensively Measured Longitudinal Outcomes: A New Lens through Data Partitioning

Estimation and inference with modern longitudinal data from wearable devices, which consist of biological signals at high-frequency time points, is burdened by massive computational costs. We propose a distributed estimation and inference…

Methodology · Statistics 2023-09-13 Cole Manschot , Emily C. Hector

Deterministic bootstrapping for a class of bootstrap methods

An algorithm is described that enables efficient deterministic approximate computation of the bootstrap distribution for any linear bootstrap method $T_n^*$, alleviating the need for repeated resampling from observations (resp.…

Methodology · Statistics 2019-04-10 Thomas Pitschel

Communication-Efficient Distributed Estimator for Generalized Linear Models with a Diverging Number of Covariates

Distributed statistical inference has recently attracted immense attention. The asymptotic efficiency of the maximum likelihood estimator (MLE), the one-step MLE, and the aggregated estimating equation estimator are established for…

Methodology · Statistics 2020-08-14 Ping Zhou , Zhen Yu , Jingyi Ma , Maozai Tian , Ye Fan

Fast Estimation of the Composite Link Model for Multidimensional Grouped Counts

This paper presents a significant advancement in the estimation of the Composite Link Model within a penalized likelihood framework, specifically designed to address indirect observations of grouped count data. While the model is effective…

Methodology · Statistics 2025-12-16 Carlo G. Camarda , María Durbán

A Componentwise Estimation Procedure for Multivariate Location and Scatter: Robustness, Efficiency and Scalability

Covariance matrix estimation is an important problem in multivariate data analysis, both from theoretical as well as applied points of view. Many simple and popular covariance matrix estimators are known to be severely affected by model…

Methodology · Statistics 2025-11-21 Soumya Chakraborty , Ayanendranath Basu , Abhik Ghosh

Distributed Estimation and Inference for Semi-parametric Binary Response Models

The development of modern technology has enabled data collection of unprecedented size, which poses new challenges to many statistical estimation and inference problems. This paper studies the maximum score estimator of a semi-parametric…

Statistics Theory · Mathematics 2025-02-25 Xi Chen , Wenbo Jing , Weidong Liu , Yichen Zhang

Robust and sparse estimation methods for high dimensional linear and logistic regression

Fully robust versions of the elastic net estimator are introduced for linear and logistic regression. The algorithms to compute the estimators are based on the idea of repeatedly applying the non-robust classical estimators to data subsets…

Methodology · Statistics 2017-03-16 Fatma Sevinc Kurnaz , Irene Hoffmann , Peter Filzmoser

Efficient Estimation for Generalized Linear Models on a Distributed System with Nonrandomly Distributed Data

Distributed systems have been widely used in practice to accomplish data analysis tasks of huge scales. In this work, we target on the estimation problem of generalized linear models on a distributed system with nonrandomly distributed…

Methodology · Statistics 2020-04-07 Feifei Wang , Danyang Huang , Yingqiu Zhu , Hansheng Wang

Heterogeneity-aware and communication-efficient distributed statistical inference

In multicenter research, individual-level data are often protected against sharing across sites. To overcome the barrier of data sharing, many distributed algorithms, which only require sharing aggregated information, have been developed.…

Methodology · Statistics 2021-03-25 Rui Duan , Yang Ning , Yong Chen

Evaluating High-Order Predictive Distributions in Deep Learning

Most work on supervised learning research has focused on marginal predictions. In decision problems, joint predictive distributions are essential for good performance. Previous work has developed methods for assessing low-order predictive…

Machine Learning · Statistics 2022-03-01 Ian Osband , Zheng Wen , Seyed Mohammad Asghari , Vikranth Dwaracherla , Xiuyuan Lu , Benjamin Van Roy

Statistical inference in massive datasets by empirical likelihood

In this paper, we propose a new statistical inference method for massive data sets, which is very simple and efficient by combining divide-and-conquer method and empirical likelihood. Compared with two popular methods (the bag of little…

Methodology · Statistics 2020-04-21 Xuejun Ma , Shaochen Wang , Wang Zhou

A Distributed One-Step Estimator

Distributed statistical inference has recently attracted enormous attention. Many existing work focuses on the averaging estimator. We propose a one-step approach to enhance a simple-averaging based distributed estimator. We derive the…

Methodology · Statistics 2015-11-11 Cheng Huang , Xiaoming Huo