Related papers: Distributed sequential method for analyzing massiv…

Divide-and-conquer methods for big data analysis

In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-step process: first splitting a data set into several smaller ones; then analyzing each set separately; finally combining results from each…

Machine Learning · Statistics 2021-02-23 Xueying Chen , Jerry Q. Cheng , Min-ge Xie

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important…

Statistics Theory · Mathematics 2018-08-29 Stanislav Minsker , Nate Strawn

Sequential estimation for GEE with adaptive variables and subject selection

Modeling correlated or highly stratified multiple-response data becomes a common data analysis task due to modern data monitoring facilities and methods. Generalized estimating equations (GEE) is one of the popular statistical methods for…

Methodology · Statistics 2019-03-05 Zimu Chen , Zhanfeng Wang , Yuan-chin Ivan Chang

Distributed inference for quantile regression processes

The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big…

Statistics Theory · Mathematics 2018-04-12 Stanislav Volgushev , Shih-Kang Chao , Guang Cheng

Distributed Learning of Finite Gaussian Mixtures

Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining…

Methodology · Statistics 2021-11-12 Qiong Zhang , Jiahua Chen

Robust and Parallel Bayesian Model Selection

Effective and accurate model selection is an important problem in modern data analysis. One of the major challenges is the computational burden required to handle large data sets that cannot be stored or processed on one machine. Another…

Machine Learning · Statistics 2018-06-26 Michael Minyi Zhang , Henry Lam , Lizhen Lin

Doubly Distributed Supervised Learning and Inference with High-Dimensional Correlated Outcomes

This paper presents a unified framework for supervised learning and inference procedures using the divide-and-conquer approach for high-dimensional correlated outcomes. We propose a general class of estimators that can be implemented in a…

Statistics Theory · Mathematics 2020-09-22 Emily C. Hector , Peter X. -K. Song

Scalable Bayesian inference for time series via divide-and-conquer

Bayesian computational algorithms tend to scale poorly as data size increases. This has motivated divide-and-conquer-based approaches for scalable inference. These divide the data into subsets, perform inference for each subset in parallel,…

Methodology · Statistics 2025-10-22 Rihui Ou , Lachlan Astfalck , Deborshee Sen , David Dunson

Active Sampling of Multiple Sources for Sequential Estimation

Consider $K$ processes, each generating a sequence of identical and independent random variables. The probability measures of these processes have random parameters that must be estimated. Specifically, they share a parameter $\theta$…

Machine Learning · Computer Science 2022-10-12 Arpan Mukherjee , Ali Tajer , Pin-Yu Chen , Payel Das

Distributed Estimation and Inference with Statistical Guarantees

This paper studies hypothesis testing and parameter estimation in the context of the divide and conquer algorithm. In a unified likelihood based framework, we propose new test statistics and point estimators obtained by aggregating various…

Statistics Theory · Mathematics 2015-09-21 Heather Battey , Jianqing Fan , Han Liu , Junwei Lu , Ziwei Zhu

On the Feasibility of Distributed Kernel Regression for Big Data

In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a…

Machine Learning · Statistics 2015-05-06 Chen Xu , Yongquan Zhang , Runze Li

A divide and conquer sequential Monte Carlo approach to high dimensional filtering

We propose a divide-and-conquer approach to filtering which decomposes the state variable into low-dimensional components to which standard particle filtering tools can be successfully applied and recursively merges them to recover the full…

Methodology · Statistics 2022-11-28 Francesca R. Crucinio , Adam M. Johansen

A computationally efficient procedure for combining ecological datasets by means of sequential consensus inference

Combining data has become an indispensable tool for managing the current diversity and abundance of data. But, as data complexity and data volume swell, the computational demands of previously proposed models for combining data escalate…

Methodology · Statistics 2024-06-13 Mario Figueira , David Conesa , Antonio López-Quílez , Iosu Paradinas

A review of distributed statistical inference

The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of…

Computation · Statistics 2023-04-14 Yuan Gao , Weidong Liu , Hansheng Wang , Xiaozhou Wang , Yibo Yan , Riquan Zhang

Functional Regression with Intensively Measured Longitudinal Outcomes: A New Lens through Data Partitioning

Estimation and inference with modern longitudinal data from wearable devices, which consist of biological signals at high-frequency time points, is burdened by massive computational costs. We propose a distributed estimation and inference…

Methodology · Statistics 2023-09-13 Cole Manschot , Emily C. Hector

Divide and Conquer Local Average Regression

The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art…

Machine Learning · Computer Science 2016-03-15 Xiangyu Chang , Shaobo Lin , Yao Wang

A Selective Review on Statistical Methods for Massive Data Computation: Distributed Computing, Subsampling, and Minibatch Techniques

This paper presents a selective review of statistical computation methods for massive data analysis. A huge amount of statistical methods for massive data computation have been rapidly developed in the past decades. In this work, we focus…

Methodology · Statistics 2024-04-25 Xuetong Li , Yuan Gao , Hong Chang , Danyang Huang , Yingying Ma , Rui Pan , Haobo Qi , Feifei Wang , Shuyuan Wu , Ke Xu , Jing Zhou , Xuening Zhu , Yingqiu Zhu , Hansheng Wang

Globally-Optimal Greedy Experiment Selection for Active Sequential Estimation

Motivated by modern applications such as computerized adaptive testing, sequential rank aggregation, and heterogeneous data source selection, we study the problem of active sequential estimation, which involves adaptively selecting…

Statistics Theory · Mathematics 2024-02-14 Xiaoou Li , Hongru Zhao

A Massive Data Framework for M-Estimators with Cubic-Rate

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing…

Statistics Theory · Mathematics 2017-04-06 Chengchun Shi , Wenbin Lu , Rui Song

Parallel and distributed optimization methods for estimation and control in networks

System performance for networks composed of interconnected subsystems can be increased if the traditionally separated subsystems are jointly optimized. Recently, parallel and distributed optimization methods have emerged as a powerful tool…

Optimization and Control · Mathematics 2013-02-14 Ion Necoara , Valentin Nedelcu , Ioan Dumitrache