Related papers: Divide and Conquer Local Average Regression
In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-step process: first splitting a data set into several smaller ones; then analyzing each set separately; finally combining results from each…
This paper presents a class of new algorithms for distributed statistical estimation that exploit divide-and-conquer approach. We show that one of the key benefits of the divide-and-conquer strategy is robustness, an important…
Divide-and-conquer is a general strategy to deal with large scale problems. It is typically applied to generate ensemble instances, which potentially limits the problem size it can handle. Additionally, the data are often divided by random…
We consider the learning of algorithmic tasks by mere observation of input-output pairs. Rather than studying this as a black-box discrete regression problem with no assumption whatsoever on the input-output mapping, we concentrate on tasks…
We study the risk performance of distributed learning for the regularization empirical risk minimization with fast convergence rate, substantially improving the error analysis of the existing divide-and-conquer based distributed learning.…
In this article, we advance divide-and-conquer strategies for solving the community detection problem in networks. We propose two algorithms which perform clustering on a number of small subgraphs and finally patches the results into a…
Standard model-free deep reinforcement learning (RL) algorithms sample a new initial state for each trial, allowing them to optimize policies that can perform well even in highly stochastic environments. However, problems that exhibit…
In modern scientific research, massive datasets with huge numbers of observations are frequently encountered. To facilitate the computational process, a divide-and-conquer scheme is often used for the analysis of big data. In such a…
The divide and conquer method is a common strategy for handling massive data. In this article, we study the divide and conquer method for cubic-rate estimators under the massive data framework. We develop a general theory for establishing…
We establish optimal convergence rates for a decomposition-based scalable approach to kernel ridge regression. The method is simple to describe: it randomly partitions a dataset of size N into m subsets of equal size, computes an…
Machine learning-based intrusion detection requires complex models to capture patterns in high-dimensional, noisy, and class-imbalanced raw network traffic, yet deploying such models remains impractical on resource-constrained devices with…
The ever-growing size of the datasets renders well-studied learning techniques, such as Kernel Ridge Regression, inapplicable, posing a serious computational challenge. Divide-and-conquer is a common remedy, suggesting to split the dataset…
To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly…
The predict+optimize problem combines machine learning ofproblem coefficients with a combinatorial optimization prob-lem that uses the predicted coefficients. While this problemcan be solved in two separate stages, it is better to…
Distributed machine learning systems have been receiving increasing attentions for their efficiency to process large scale data. Many distributed frameworks have been proposed for different machine learning tasks. In this paper, we study…
Symbolic regression aims to find a function that best explains the relationship between independent variables and the objective value based on a given set of sample data. Genetic programming (GP) is usually considered as an appropriate…
We study how the divide and conquer principle --- partition the available data into subsamples, compute an estimate from each subsample and combine these appropriately to form the final estimator --- works in non-standard problems where…
Distributed learning is an effective way to analyze big data. In distributed regression, a typical approach is to divide the big data into multiple blocks, apply a base regression algorithm on each of them, and then simply average the…
Spectral clustering is one of the most popular clustering methods. However, how to balance the efficiency and effectiveness of the large-scale spectral clustering with limited computing resources has not been properly solved for a long…
In this paper, we investigate a divide and conquer approach to Kernel Ridge Regression (KRR). Given n samples, the division step involves separating the points based on some underlying disjoint partition of the input space (possibly via…