Related papers: Efficient Elastic Net Regularization for Sparse Li…

A generalized linear joint trained framework for semi-supervised learning of sparse features

The elastic-net is among the most widely used types of regularization algorithms, commonly associated with the problem of supervised generalized linear model estimation via penalized maximum likelihood. Its nice properties originate from a…

Machine Learning · Statistics 2020-10-05 Juan C. Laria , Line H. Clemmensen , Bjarne K. Ersbøll

$\ell_0$ Regularized Structured Sparsity Convolutional Neural Networks

Deepening and widening convolutional neural networks (CNNs) significantly increases the number of trainable weight parameters by adding more convolutional layers and feature maps per layer, respectively. By imposing inter- and intra-group…

Computer Vision and Pattern Recognition · Computer Science 2019-12-18 Kevin Bui , Fredrick Park , Shuai Zhang , Yingyong Qi , Jack Xin

Stochastic Iterative Hard Thresholding for Graph-structured Sparsity Optimization

Stochastic optimization algorithms update models with cheap per-iteration costs sequentially, which makes them amenable for large-scale data analysis. Such algorithms have been widely studied for structured sparse models where the sparsity…

Machine Learning · Computer Science 2019-05-10 Baojian Zhou , Feng Chen , Yiming Ying

Efficient Exact Gradient Update for training Deep Networks with Very Large Sparse Targets

An important class of problems involves training deep neural networks with sparse prediction targets of very high dimension D. These occur naturally in e.g. neural language models or the learning of word-embeddings, often posed as…

Neural and Evolutionary Computing · Computer Science 2015-07-15 Pascal Vincent , Alexandre de Brébisson , Xavier Bouthillier

Elastic-Net Regularization: Error estimates and Active Set Methods

This paper investigates theoretical properties and efficient numerical algorithms for the so-called elastic-net regularization originating from statistics, which enforces simultaneously l^1 and l^2 regularization. The stability of the…

Numerical Analysis · Mathematics 2015-05-13 Bangti Jin , Dirk Lorenz , Stefan Schiffler

Balance is Essence: Accelerating Sparse Training via Adaptive Gradient Correction

Despite impressive performance, deep neural networks require significant memory and computation costs, prohibiting their application in resource-constrained scenarios. Sparse training is one of the most common techniques to reduce these…

Machine Learning · Computer Science 2023-12-06 Bowen Lei , Dongkuan Xu , Ruqi Zhang , Shuren He , Bani K. Mallick

Recurrent Stochastic Configuration Networks with Hybrid Regularization for Nonlinear Dynamics Modelling

Recurrent stochastic configuration networks (RSCNs) have shown great potential in modelling nonlinear dynamic systems with uncertainties. This paper presents an RSCN with hybrid regularization to enhance both the learning capacity and…

Machine Learning · Computer Science 2024-12-03 Gang Dang , Dianhui Wang

Online Learning Under A Separable Stochastic Approximation Framework

We propose an online learning algorithm for a class of machine learning models under a separable stochastic approximation framework. The essence of our idea lies in the observation that certain parameters in the models are easier to…

Machine Learning · Computer Science 2023-05-23 Min Gan , Xiang-xiang Su , Guang-yong Chen , Jing Chen

Efficient Neural Network Training via Forward and Backward Propagation Sparsification

Sparse training is a natural idea to accelerate the training speed of deep neural networks and save the memory usage, especially since large modern neural networks are significantly over-parameterized. However, most of the existing methods…

Machine Learning · Computer Science 2021-11-11 Xiao Zhou , Weizhong Zhang , Zonghao Chen , Shizhe Diao , Tong Zhang

Make $\ell_1$ Regularization Effective in Training Sparse CNN

Compressed Sensing using $\ell_1$ regularization is among the most powerful and popular sparsification technique in many applications, but why has it not been used to obtain sparse deep learning model such as convolutional neural network…

Machine Learning · Computer Science 2021-10-06 Juncai He , Xiaodong Jia , Jinchao Xu , Lian Zhang , Liang Zhao

Adaptive Regularization for Sparsity Control in Bregman-Based Optimizers

Sparse training reduces the memory and computational costs of deep neural networks. However, sparse optimization methods, e.g., those adding an $\ell_1$ penalty, often control sparsity only indirectly through a regularization parameter…

Machine Learning · Computer Science 2026-05-21 Ahmad Aloradi , Tim Roith , Emanuël A. P. Habets , Daniel Tenbrinck

Transformed $\ell_1$ Regularization for Learning Sparse Deep Neural Networks

Deep neural networks (DNNs) have achieved extraordinary success in numerous areas. However, to attain this success, DNNs often carry a large number of weight parameters, leading to heavy costs of memory and computation resources.…

Computer Vision and Pattern Recognition · Computer Science 2019-01-07 Rongrong Ma , Jianyu Miao , Lingfeng Niu , Peng Zhang

Communication-Efficient and Drift-Robust Federated Learning via Elastic Net

Federated learning (FL) is a distributed method to train a global model over a set of local clients while keeping data localized. It reduces the risks of privacy and security but faces important challenges including expensive communication…

Machine Learning · Computer Science 2022-10-07 Seonhyeong Kim , Jiheon Woo , Daewon Seo , Yongjune Kim

Dynamic Gradient Sparse Update for Edge Training

Training on edge devices enables personalized model fine-tuning to enhance real-world performance and maintain data privacy. However, the gradient computation for backpropagation in the training requires significant memory buffers to store…

Hardware Architecture · Computer Science 2025-03-25 I-Hsuan Li , Tian-Sheuan Chang

Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters can achieve zero training error. In such cases, the choice of the optimization algorithm and its respective hyper-parameters introduces…

Machine Learning · Computer Science 2019-12-06 Gauthier Gidel , Francis Bach , Simon Lacoste-Julien

A Stochastic Proximal Method for Nonsmooth Regularized Finite Sum Optimization

We consider the problem of training a deep neural network with nonsmooth regularization to retrieve a sparse and efficient sub-structure. Our regularizer is only assumed to be lower semi-continuous and prox-bounded. We combine an adaptive…

Machine Learning · Statistics 2022-06-20 Dounia Lakhmiri , Dominique Orban , Andrea Lodi

Fast Learning with Nonconvex L1-2 Regularization

Convex regularizers are often used for sparse learning. They are easy to optimize, but can lead to inferior prediction performance. The difference of $\ell_1$ and $\ell_2$ ($\ell_{1-2}$) regularizer has been recently proposed as a nonconvex…

Machine Learning · Computer Science 2017-06-21 Quanming Yao , James T. Kwok , Xiawei Guo

Gradient Sparsification for Communication-Efficient Distributed Optimization

Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as…

Machine Learning · Computer Science 2017-10-31 Jianqiao Wangni , Jialei Wang , Ji Liu , Tong Zhang

Training Sparse Neural Networks using Compressed Sensing

Pruning the weights of neural networks is an effective and widely-used technique for reducing model size and inference complexity. We develop and test a novel method based on compressed sensing which combines the pruning and training into a…

Computer Vision and Pattern Recognition · Computer Science 2021-04-08 Jonathan W. Siegel , Jianhong Chen , Pengchuan Zhang , Jinchao Xu

Randomized Sparse Neural Galerkin Schemes for Solving Evolution Equations with Deep Networks

Training neural networks sequentially in time to approximate solution fields of time-dependent partial differential equations can be beneficial for preserving causality and other physics properties; however, the sequential-in-time training…

Machine Learning · Computer Science 2023-10-10 Jules Berman , Benjamin Peherstorfer