Related papers: Divide-and-conquer methods for big data analysis

Distributed sequential method for analyzing massive data

To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly…

Methodology · Statistics 2018-12-27 Zhanfeng Wang , Yuan-chin Ivan Chang

Divide-and-Conquer Bayesian Inference in Hidden Markov Models

Divide-and-conquer Bayesian methods consist of three steps: dividing the data into smaller computationally manageable subsets, running a sampling algorithm in parallel on all the subsets, and combining parameter draws from all the subsets.…

Methodology · Statistics 2021-06-01 Chunlei Wang , Sanvesh Srivastava

Divide and Conquer Local Average Regression

The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art…

Machine Learning · Computer Science 2016-03-15 Xiangyu Chang , Shaobo Lin , Yao Wang

On the Feasibility of Distributed Kernel Regression for Big Data

In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a…

Machine Learning · Statistics 2015-05-06 Chen Xu , Yongquan Zhang , Runze Li

Distributed Statistical Estimation and Rates of Convergence in Normal Approximation

This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important…

Statistics Theory · Mathematics 2018-08-29 Stanislav Minsker , Nate Strawn

A Divide and Conquer Strategy for High Dimensional Bayesian Factor Models

We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of…

Methodology · Statistics 2016-12-30 Gautam Sabnis , Debdeep Pati , Barbara Engelhardt , Natesh Pillai

A Massive Data Framework for M-Estimators with Cubic-Rate

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing…

Statistics Theory · Mathematics 2017-04-06 Chengchun Shi , Wenbin Lu , Rui Song

Two provably consistent divide and conquer clustering algorithms for large networks

In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches the results into a…

Machine Learning · Statistics 2017-08-21 Soumendu Sundar Mukherjee , Purnamrita Sarkar , Peter J. Bickel

Distributed Spatial Data Clustering as a New Approach for Big Data Analysis

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Divide-and-conquer based Large-Scale Spectral Clustering

Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long…

Machine Learning · Computer Science 2022-07-12 Hongmin Li , Xiucai Ye , Akira Imakura , Tetsuya Sakurai

Application of generalized linear models in big data: a divide and recombine (D&R) approach

D&R is a statistical approach designed to handle large and complex datasets. It partitions the dataset into several manageable subsets and subsequently applies the analytic method to each subset independently to obtain results. Finally, the…

Methodology · Statistics 2024-12-12 Md. Mahadi Hassan Nayem , Soma Chowdhury Biswas

Distributed Learning of Finite Gaussian Mixtures

Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining…

Methodology · Statistics 2021-11-12 Qiong Zhang , Jiahua Chen

Efficient Divide-And-Conquer Classification Based on Feature-Space Decomposition

This study presents a divide-and-conquer (DC) approach based on feature space decomposition for classification. When large-scale datasets are present, typical approaches usually employed truncated kernel methods on the feature space or DC…

Machine Learning · Computer Science 2018-07-30 Qi Guo , Bo-Wei Chen , Feng Jiang , Xiangyang Ji , Sun-Yuan Kung

Divide and...conquer? On the limits of algorithmic approaches to syntactic semantic structure

In computer science, divide and conquer (D&C) is an algorithm design paradigm based on multi-branched recursion. A D&C algorithm works by recursively and monotonically breaking down a problem into sub problems of the same (or a related)…

Computation and Language · Computer Science 2018-09-24 Diego Gabriel Krivochen

A divide and conquer sequential Monte Carlo approach to high dimensional filtering

We propose a divide-and-conquer approach to filtering which decomposes the state variable into low-dimensional components to which standard particle filtering tools can be successfully applied and recursively merges them to recover the full…

Methodology · Statistics 2022-11-28 Francesca R. Crucinio , Adam M. Johansen

Scalable Bayesian inference for time series via divide-and-conquer

Bayesian computational algorithms tend to scale poorly as data size increases. This has motivated divide-and-conquer-based approaches for scalable inference. These divide the data into subsets, perform inference for each subset in parallel,…

Methodology · Statistics 2025-10-22 Rihui Ou , Lachlan Astfalck , Deborshee Sen , David Dunson

Divide-and-conquer with finite sample sizes: valid and efficient possibilistic inference

Divide-and-conquer methods use large-sample approximations to provide frequentist guarantees when each block of data is both small enough to facilitate efficient computation and large enough to support approximately valid inferences. When…

Methodology · Statistics 2025-04-01 Emily C. Hector , Leonardo Cella , Ryan Martin

Optimum Partition Parameter of Divide-and-Conquer Algorithm for Solving Closest-Pair Problem

Divide and Conquer is a well known algorithmic procedure for solving many kinds of problem. In this procedure, the problem is partitioned into two parts until the problem is trivially solvable. Finding the distance of the closest pair is an…

Computational Geometry · Computer Science 2011-11-11 Mohammad Zaidul Karim , Nargis Akter

$DC^2$: A Divide-and-conquer Algorithm for Large-scale Kernel Learning with Application to Clustering

Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random…

Machine Learning · Computer Science 2019-11-19 Ke Alexander Wang , Xinran Bian , Pan Liu , Donghui Yan

Data Partitioning View of Mining Big Data

There are two main approximations of mining big data in memory. One is to partition a big dataset to several subsets, so as to mine each subset in memory. By this way, global patterns can be obtained by synthesizing all local patterns…

Databases · Computer Science 2016-11-30 Shichao Zhang