Related papers: Divide-and-conquer methods for big data analysis
To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly…
Divide-and-conquer Bayesian methods consist of three steps: dividing the data into smaller computationally manageable subsets, running a sampling algorithm in parallel on all the subsets, and combining parameter draws from all the subsets.…
The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art…
In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a…
This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important…
We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of…
The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing…
In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches the results into a…
In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…
Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long…
D&R is a statistical approach designed to handle large and complex datasets. It partitions the dataset into several manageable subsets and subsequently applies the analytic method to each subset independently to obtain results. Finally, the…
Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining…
This study presents a divide-and-conquer (DC) approach based on feature space decomposition for classification. When large-scale datasets are present, typical approaches usually employed truncated kernel methods on the feature space or DC…
In computer science, divide and conquer (D&C) is an algorithm design paradigm based on multi-branched recursion. A D&C algorithm works by recursively and monotonically breaking down a problem into sub problems of the same (or a related)…
We propose a divide-and-conquer approach to filtering which decomposes the state variable into low-dimensional components to which standard particle filtering tools can be successfully applied and recursively merges them to recover the full…
Bayesian computational algorithms tend to scale poorly as data size increases. This has motivated divide-and-conquer-based approaches for scalable inference. These divide the data into subsets, perform inference for each subset in parallel,…
Divide-and-conquer methods use large-sample approximations to provide frequentist guarantees when each block of data is both small enough to facilitate efficient computation and large enough to support approximately valid inferences. When…
Divide and Conquer is a well known algorithmic procedure for solving many kinds of problem. In this procedure, the problem is partitioned into two parts until the problem is trivially solvable. Finding the distance of the closest pair is an…
Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random…
There are two main approximations of mining big data in memory. One is to partition a big dataset to several subsets, so as to mine each subset in memory. By this way, global patterns can be obtained by synthesizing all local patterns…