Related papers: Distributed Coordinate Descent Method for Learning…
This paper presents a novel coordinate descent algorithm leveraging a combination of one-directional line search and gradient information for parameter updates for a squared error loss function. Each parameter undergoes updates determined…
Coordinate descent methods employ random partial updates of decision variables in order to solve huge-scale convex optimization problems. In this work, we introduce new adaptive rules for the random selection of their updates. By adaptive,…
Hashing method maps similar data to binary hashcodes with smaller hamming distance, and it has received a broad attention due to its low storage cost and fast retrieval speed. However, the existing limitations make the present algorithms…
Discrete optimization is a central problem in mathematical optimization with a broad range of applications, among which binary optimization and sparse optimization are two common ones. However, these problems are NP-hard and thus difficult…
We propose an efficient distributed randomized coordinate descent method for minimizing regularized non-strongly convex loss functions. The method attains the optimal $O(1/k^2)$ convergence rate, where $k$ is the iteration counter. The core…
We develop an exact coordinate descent algorithm for high-dimensional regularized Huber regression. In contrast to composite gradient descent methods, our algorithm fully exploits the advantages of coordinate descent when the underlying…
Coordinate descent algorithms solve optimization problems by successively performing approximate minimization along coordinate directions or coordinate hyperplanes. They have been used in applications for many years, and their popularity…
We consider decentralized model training in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical…
Distributionally robust optimization (DRO) problems are increasingly seen as a viable method to train machine learning models for improved model generalization. These min-max formulations, however, are more difficult to solve. We therefore…
The world needs diverse and unbiased data to train deep learning models. Currently data comes from a variety of sources that are unmoderated to a large extent. The outcomes of training neural networks with unverified data yields biased…
We demonstrate that distributed block coordinate descent can quickly solve kernel regression and classification problems with millions of data points. Armed with this capability, we conduct a thorough comparison between the full kernel, the…
We consider federated learning in tiered communication networks. Our network model consists of a set of silos, each holding a vertical partition of the data. Each silo contains a hub and a set of clients, with the silo's vertical data shard…
Second-order optimizers hold intriguing potential for deep learning, but suffer from increased cost and sensitivity to the non-convexity of the loss surface as compared to gradient-based approaches. We introduce a coordinate descent method…
Many real-world problems are categorized as large-scale problems, and metaheuristic algorithms as an alternative method to solve large-scale problem; they need the evaluation of many candidate solutions to tackle them prior to their…
In prior works, stochastic dual coordinate ascent (SDCA) has been parallelized in a multi-core environment where the cores communicate through shared memory, or in a multi-processor distributed memory environment where the processors…
As the complexity of our neural network models grow, so too do the data and computation requirements for successful training. One proposed solution to this problem is training on a distributed network of computational devices, thus…
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex…
Conventional gradient descent methods compute the gradients for multiple variables through the partial derivative. Treating the coupled variables independently while ignoring the interaction, however, leads to an insufficient optimization…
Scalable machine learning over big data is an important problem that is receiving a lot of attention in recent years. On popular distributed environments such as Hadoop running on a cluster of commodity machines, communication costs are…
We consider chance-constrained problems with discrete random distribution. We aim for problems with a large number of scenarios. We propose a novel method based on the stochastic gradient descent method which performs updates of the…