Related papers: Sparse Online Learning via Truncated Gradient
Stochastic gradient descent (SGD) is commonly used for optimization in large-scale machine learning problems. Langford et al. (2009) introduce a sparse online learning method to induce sparsity via truncated gradient. With high-dimensional…
Logistic regression, the Support Vector Machine (SVM), and least squares are well-studied methods in the statistical and computer science community, with various practical applications. High-dimensional data arriving on a real-time basis…
Finding neural network weights that generalize well from small datasets is difficult. A promising approach is to learn a weight initialization such that a small number of weight changes results in low generalization error. We show that this…
Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data collected by local entities. It includes local computation and synchronization steps. To reduce the communication…
In this work, we consider learning sparse models in large scale settings, where the number of samples and the feature dimension can grow as large as millions or billions. Two immediate issues occur under such challenging scenario: (i)…
We address the challenge of zeroth-order online convex optimization where the objective function's gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this…
Sparse connectivity is a hallmark of the brain and a desired property of artificial neural networks. It promotes energy efficiency, simplifies training, and enhances the robustness of network function. Thus, a detailed understanding of how…
Distributed training of massive machine learning models, in particular deep neural networks, via Stochastic Gradient Descent (SGD) is becoming commonplace. Several families of communication-reduction methods, such as quantization,…
Several convex formulation methods have been proposed previously for statistical estimation with structured sparsity as the prior. These methods often require a carefully tuned regularization parameter, often a cumbersome or heuristic…
We propose an online learning algorithm for a class of machine learning models under a separable stochastic approximation framework. The essence of our idea lies in the observation that certain parameters in the models are easier to…
Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as…
Excessive computational cost for learning large data and streaming data can be alleviated by using stochastic algorithms, such as stochastic gradient descent and its variants. Recent advances improve stochastic algorithms on convergence…
We propose a communicationally and computationally efficient algorithm for high-dimensional distributed sparse learning. At each iteration, local machines compute the gradient on local data and the master machine solves one shifted $l_1$…
We consider the online convex optimization problem. In the setting of arbitrary sequences and finite set of parameters, we establish a new fast-rate quantile regret bound. Then we investigate the optimization into the L1-ball by…
Theoretical results show that Bayesian methods can achieve lower bounds on regret for online logistic regression. In practice, however, such techniques may not be feasible especially for very large feature sets. Various approximations that,…
In this paper, we study the generalization properties of online learning based stochastic methods for supervised learning problems where the loss function is dependent on more than one training sample (e.g., metric learning, ranking). We…
Gradient-variation online learning aims to achieve regret guarantees that scale with variations in the gradients of online functions, which has been shown to be crucial for attaining fast convergence in games and robustness in stochastic…
Informative gradients are often lost in large batch updates. We propose a robust mechanism to reinforce the sparse components within a random batch of data points. A finite queue of online gradients is used to determine their expected…
The development of online algorithms to track time-varying systems has drawn a lot of attention in the last years, in particular in the framework of online convex optimization. Meanwhile, sparse time-varying optimization has emerged as a…
Sparsity is a desirable attribute. It can lead to more efficient and more effective representations compared to the dense model. Meanwhile, learning sparse latent representations has been a challenging problem in the field of computer…