Related papers: Distributed Coordinate Descent for Generalized Lin…
Distributed training of $l_1$ regularized classifiers has received great attention recently. Most existing methods approach this problem by taking steps obtained from approximating the objective by a quadratic approximation that is…
Solving logistic regression with L1-regularization in distributed settings is an important problem. This problem arises when training dataset is very large and cannot fit the memory of a single machine. We present d-GLMNET, a new algorithm…
Distributed learning provides an attractive framework for scaling the learning task by sharing the computational load over multiple nodes in a network. Here, we investigate the performance of distributed learning for large-scale linear…
Large-scale L1-regularized loss minimization problems arise in high-dimensional applications such as compressed sensing and high-dimensional supervised learning, including classification and regression problems. High-performance algorithms…
Online learning algorithms are fast, memory-efficient, easy to implement, and applicable to many prediction problems, including classification, regression, and ranking. Several online algorithms were proposed in the past few decades, some…
Distributed learning facilitates the scaling-up of data processing by distributing the computational burden over several nodes. Despite the vast interest in distributed learning, generalization performance of such approaches is not well…
Distributed learning is an effective way to analyze big data. In distributed regression, a typical approach is to divide the big data into multiple blocks, apply a base regression algorithm on each of them, and then simply average the…
Combining data from varied sources has considerable potential for knowledge discovery: collaborating data parties can mine data in an expanded feature space, allowing them to explore a larger range of scientific questions. However, data…
In this paper, we study two general classes of optimization algorithms for kernel methods with convex loss function and quadratic norm regularization, and analyze their convergence. The first approach, based on fixed-point iterations, is…
We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed…
We propose an approach for fitting linear regression models that splits the set of covariates into groups. The optimal split of the variables into groups and the regularized estimation of the regression coefficients are performed by…
We propose an L1-penalized algorithm for fitting high-dimensional generalized linear mixed models. Generalized linear mixed models (GLMMs) can be viewed as an extension of generalized linear models for clustered observations. This…
With the availability of extraordinarily huge data sets, solving the problems of distributed statistical methodology and computing for such data sets has become increasingly crucial in the big data area. In this paper, we focus on the…
Many machine learning algorithms have been developed under the assumption that data sets are already available in batch form. Yet in many application domains data is only available sequentially overtime via compute nodes in different…
The generalised linear model (GLM) is a very important tool for analysing real data in biology, sociology, agriculture, engineering and many other application domain where the relationship between the response and explanatory variables may…
We consider a distributed estimation method in a setting with heterogeneous streams of correlated data distributed across nodes in a network. In the considered approach, linear models are estimated locally (i.e., with only local data)…
The data consistency for the physical forward model is crucial in inverse problems, especially in MR imaging reconstruction. The standard way is to unroll an iterative algorithm into a neural network with a forward model embedded. The…
We investigate the generalisation performance of Distributed Gradient Descent with Implicit Regularisation and Random Features in the homogenous setting where a network of agents are given data sampled independently from the same unknown…
We demonstrate that distributed block coordinate descent can quickly solve kernel regression and classification problems with millions of data points. Armed with this capability, we conduct a thorough comparison between the full kernel, the…
This paper considers a general data-fitting problem over a networked system, in which many computing nodes are connected by an undirected graph. This kind of problem can find many real-world applications and has been studied extensively in…