English
Related papers

Related papers: Divide-and-conquer methods for big data analysis

200 papers

To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly…

Methodology · Statistics 2018-12-27 Zhanfeng Wang , Yuan-chin Ivan Chang

Divide-and-conquer Bayesian methods consist of three steps: dividing the data into smaller computationally manageable subsets, running a sampling algorithm in parallel on all the subsets, and combining parameter draws from all the subsets.…

Methodology · Statistics 2021-06-01 Chunlei Wang , Sanvesh Srivastava

The divide and conquer strategy, which breaks a massive data set into a se- ries of manageable data blocks, and then combines the independent results of data blocks to obtain a final decision, has been recognized as a state-of-the-art…

Machine Learning · Computer Science 2016-03-15 Xiangyu Chang , Shaobo Lin , Yao Wang

In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a…

Machine Learning · Statistics 2015-05-06 Chen Xu , Yongquan Zhang , Runze Li

This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important…

Statistics Theory · Mathematics 2018-08-29 Stanislav Minsker , Nate Strawn

We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of…

Methodology · Statistics 2016-12-30 Gautam Sabnis , Debdeep Pati , Barbara Engelhardt , Natesh Pillai

The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing…

Statistics Theory · Mathematics 2017-04-06 Chengchun Shi , Wenbin Lu , Rui Song

In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches the results into a…

Machine Learning · Statistics 2017-08-21 Soumendu Sundar Mukherjee , Purnamrita Sarkar , Peter J. Bickel

In this paper we propose a new approach for Big Data mining and analysis. This new approach works well on distributed datasets and deals with data clustering task of the analysis. The approach consists of two main phases, the first phase…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-05 Malika Bendechache , Nhien-An Le-Khac , M-Tahar Kechadi

Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long…

Machine Learning · Computer Science 2022-07-12 Hongmin Li , Xiucai Ye , Akira Imakura , Tetsuya Sakurai

D&R is a statistical approach designed to handle large and complex datasets. It partitions the dataset into several manageable subsets and subsequently applies the analytic method to each subset independently to obtain results. Finally, the…

Methodology · Statistics 2024-12-12 Md. Mahadi Hassan Nayem , Soma Chowdhury Biswas

Advances in information technology have led to extremely large datasets that are often kept in different storage centers. Existing statistical methods must be adapted to overcome the resulting computational obstacles while retaining…

Methodology · Statistics 2021-11-12 Qiong Zhang , Jiahua Chen

This study presents a divide-and-conquer (DC) approach based on feature space decomposition for classification. When large-scale datasets are present, typical approaches usually employed truncated kernel methods on the feature space or DC…

Machine Learning · Computer Science 2018-07-30 Qi Guo , Bo-Wei Chen , Feng Jiang , Xiangyang Ji , Sun-Yuan Kung

In computer science, divide and conquer (D&C) is an algorithm design paradigm based on multi-branched recursion. A D&C algorithm works by recursively and monotonically breaking down a problem into sub problems of the same (or a related)…

Computation and Language · Computer Science 2018-09-24 Diego Gabriel Krivochen

We propose a divide-and-conquer approach to filtering which decomposes the state variable into low-dimensional components to which standard particle filtering tools can be successfully applied and recursively merges them to recover the full…

Methodology · Statistics 2022-11-28 Francesca R. Crucinio , Adam M. Johansen

Bayesian computational algorithms tend to scale poorly as data size increases. This has motivated divide-and-conquer-based approaches for scalable inference. These divide the data into subsets, perform inference for each subset in parallel,…

Methodology · Statistics 2025-10-22 Rihui Ou , Lachlan Astfalck , Deborshee Sen , David Dunson

Divide-and-conquer methods use large-sample approximations to provide frequentist guarantees when each block of data is both small enough to facilitate efficient computation and large enough to support approximately valid inferences. When…

Methodology · Statistics 2025-04-01 Emily C. Hector , Leonardo Cella , Ryan Martin

Divide and Conquer is a well known algorithmic procedure for solving many kinds of problem. In this procedure, the problem is partitioned into two parts until the problem is trivially solvable. Finding the distance of the closest pair is an…

Computational Geometry · Computer Science 2011-11-11 Mohammad Zaidul Karim , Nargis Akter

Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random…

Machine Learning · Computer Science 2019-11-19 Ke Alexander Wang , Xinran Bian , Pan Liu , Donghui Yan

There are two main approximations of mining big data in memory. One is to partition a big dataset to several subsets, so as to mine each subset in memory. By this way, global patterns can be obtained by synthesizing all local patterns…

Databases · Computer Science 2016-11-30 Shichao Zhang
‹ Prev 1 2 3 10 Next ›