Related papers: Parallel inference for massive distributed spatial…

Distributed inference for quantile regression processes

The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big…

Statistics Theory · Mathematics 2018-04-12 Stanislav Volgushev , Shih-Kang Chao , Guang Cheng

Distributed Inference for Spatial Extremes Modeling in High Dimensions

Extreme environmental events frequently exhibit spatial and temporal dependence. These data are often modeled using max stable processes (MSPs). MSPs are computationally prohibitive to fit for as few as a dozen observations, with supposed…

Methodology · Statistics 2022-05-02 Emily C. Hector , Brian J. Reich

Multi-Scale Process Modelling and Distributed Computation for Spatial Data

Recent years have seen a huge development in spatial modelling and prediction methodology, driven by the increased availability of remote-sensing data and the reduced cost of distributed-processing technology. It is well known that…

Computation · Statistics 2020-02-18 Andrew Zammit-Mangion , Jonathan Rougier

Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution

We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed…

Methodology · Statistics 2020-07-23 Lu Tang , Ling Zhou , Peter X. -K. Song

Communication Efficient Parallel Algorithms for Optimization on Manifolds

The last decade has witnessed an explosion in the development of models, theory and computational algorithms for "big data" analysis. In particular, distributed computing has served as a natural and dominating paradigm for statistical…

Machine Learning · Statistics 2018-11-02 Bayan Saparbayeva , Michael Minyi Zhang , Lizhen Lin

Distributed Bootstrap for Simultaneous Inference Under High Dimensionality

We propose a distributed bootstrap method for simultaneous inference on high-dimensional massive data that are stored and processed with many machines. The method produces an $\ell_\infty$-norm confidence region based on a…

Methodology · Statistics 2022-06-15 Yang Yu , Shih-Kang Chao , Guang Cheng

A review of distributed statistical inference

The rapid emergence of massive datasets in various fields poses a serious challenge to traditional statistical methods. Meanwhile, it provides opportunities for researchers to develop novel algorithms. Inspired by the idea of…

Computation · Statistics 2023-04-14 Yuan Gao , Weidong Liu , Hansheng Wang , Xiaozhou Wang , Yibo Yan , Riquan Zhang

Simultaneous inference for linear mixed model parameters with an application to small area estimation

Over the past decades, linear mixed models have attracted considerable attention in various fields of applied statistics. They are popular whenever clustered, hierarchical or longitudinal data are investigated. Nonetheless, statistical…

Methodology · Statistics 2021-09-20 Katarzyna Reluga , María José Lombardía , Stefan Andreas Sperlich

Parallel Weighted Random Sampling

Data structures for efficient sampling from a set of weighted items are an important building block of many applications. However, few parallel solutions are known. We close many of these gaps both for shared-memory and distributed-memory…

Data Structures and Algorithms · Computer Science 2021-07-20 Lorenz Hübschle-Schneider , Peter Sanders

Distributed Parallel Inference on Large Factor Graphs

As computer clusters become more common and the size of the problems encountered in the field of AI grows, there is an increasing demand for efficient parallel inference algorithms. We consider the problem of parallel inference on large…

Artificial Intelligence · Computer Science 2012-05-14 Joseph E. Gonzalez , Yucheng Low , Carlos E. Guestrin , David O'Hallaron

Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but they can be time-consuming when handling large datasets. To overcome this drawback, parallel computing strategies have been…

Applications · Statistics 2020-04-24 Shangzhi Hong , Yuqi Sun , Hanying Li , Henry S. Lynn

Spatial Factor Modeling: A Bayesian Matrix-Normal Approach for Misaligned Data

Multivariate spatially-oriented data sets are prevalent in the environmental and physical sciences. Scientists seek to jointly model multiple variables, each indexed by a spatial location, to capture any underlying spatial association for…

Methodology · Statistics 2021-08-19 Lu Zhang , Sudipto Banerjee

Distributed estimation through parallel approximants

Designing scalable estimation algorithms is a core challenge in modern statistics. Here we introduce a framework to address this challenge based on parallel approximants, which yields estimators with provable properties that operate on the…

Methodology · Statistics 2023-08-04 Aritra Chakravorty , William S. Cleveland , Patrick J. Wolfe

Scalable Semiparametric Spatio-temporal Regression for Large Data Analysis

With the rapid advances of data acquisition techniques, spatio-temporal data are becoming increasingly abundant in a diverse array of disciplines. Here we develop spatio-temporal regression methodology for analyzing large amounts of…

Methodology · Statistics 2021-12-01 Ting Fung Ma , Fangfang Wang , Jun Zhu , Anthony R. Ives , Katarzyna E. Lewińska

A Partition-insensitive Parallel Framework for Distributed Model Fitting

Distributed model fitting refers to the process of fitting a mathematical or statistical model to the data using distributed computing resources, such that computing tasks are divided among multiple interconnected computers or nodes, often…

Computation · Statistics 2024-06-04 Xiaofei Wu , Rongmei Liang , Fabio Roli , Marcello Pelillo , Jing Yuan

Modified Linear Projection for Large Spatial Data Sets

Recent developments in engineering techniques for spatial data collection such as geographic information systems have resulted in an increasing need for methods to analyze large spatial data sets. These sorts of data sets can be found in…

Methodology · Statistics 2020-08-14 Toshihiro Hirano

Distributed Parameter Estimation via Pseudo-likelihood

Estimating statistical models within sensor networks requires distributed algorithms, in which both data and computation are distributed across the nodes of the network. We propose a general approach for distributed learning based on…

Machine Learning · Computer Science 2012-07-03 Qiang Liu , Alexander Ihler

Scalable Asynchronous Federated Modeling for Spatial Data

Spatial data are central to applications such as environmental monitoring and urban planning, but are often distributed across devices where privacy and communication constraints limit direct sharing. Federated modeling offers a practical…

Methodology · Statistics 2025-10-03 Jianwei Shi , Sameh Abdulah , Ying Sun , Marc G. Genton

Median Selection Subset Aggregation for Parallel Inference

For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems…

Machine Learning · Statistics 2014-10-27 Xiangyu Wang , Peichao Peng , David Dunson

Learned spatial data partitioning

Due to the significant increase in the size of spatial data, it is essential to use distributed parallel processing systems to efficiently analyze spatial data. In this paper, we first study learned spatial data partitioning, which…

Databases · Computer Science 2023-06-21 Keizo Hori , Yuya Sasaki , Daichi Amagata , Yuki Murosaki , Makoto Onizuka