Related papers: Modular Duality in Deep Learning

Modular Networks: Learning to Decompose Neural Computation

Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number…

Machine Learning · Computer Science 2018-11-14 Louis Kirsch , Julius Kunze , David Barber

Local Learning with Neuron Groups

Traditional deep network training methods optimize a monolithic objective function jointly for all the components. This can lead to various inefficiencies in terms of potential parallelization. Local learning is an approach to…

Machine Learning · Computer Science 2023-01-19 Adeetya Patel , Michael Eickenberg , Eugene Belilovsky

Parallel Deep Neural Networks Have Zero Duality Gap

Training deep neural networks is a challenging non-convex optimization problem. Recent work has proven that the strong duality holds (which means zero duality gap) for regularized finite-width two-layer ReLU networks and consequently…

Machine Learning · Computer Science 2023-03-08 Yifei Wang , Tolga Ergen , Mert Pilanci

Modularity in Deep Learning: A Survey

Modularity is a general principle present in many fields. It offers attractive advantages, including, among others, ease of conceptualization, interpretability, scalability, module combinability, and module reusability. The deep learning…

Machine Learning · Computer Science 2023-10-03 Haozhe Sun , Isabelle Guyon

Breaking Neural Network Scaling Laws with Modularity

Modular neural networks outperform nonmodular neural networks on tasks ranging from visual question answering to robotics. These performance improvements are thought to be due to modular networks' superior ability to model the compositional…

Machine Learning · Computer Science 2025-03-12 Akhilan Boopathy , Sunshine Jiang , William Yue , Jaedong Hwang , Abhiram Iyer , Ila Fiete

Pruned Neural Networks are Surprisingly Modular

The learned weights of a neural network are often considered devoid of scrutable internal structure. To discern structure in these weights, we introduce a measurable notion of modularity for multi-layer perceptrons (MLPs), and investigate…

Neural and Evolutionary Computing · Computer Science 2022-02-09 Daniel Filan , Shlomi Hod , Cody Wild , Andrew Critch , Stuart Russell

Optimization Theory for ReLU Neural Networks Trained with Normalization Layers

The success of deep neural networks is in part due to the use of normalization layers. Normalization layers like Batch Normalization, Layer Normalization and Weight Normalization are ubiquitous in practice, as they improve generalization…

Machine Learning · Computer Science 2020-06-15 Yonatan Dukler , Quanquan Gu , Guido Montúfar

Modularity as a Means for Complexity Management in Neural Networks Learning

Training a Neural Network (NN) with lots of parameters or intricate architectures creates undesired phenomena that complicate the optimization process. To address this issue we propose a first modular approach to NN design, wherein the NN…

Machine Learning · Computer Science 2019-02-26 David Castillo-Bolado , Cayetano Guerra-Artal , Mario Hernandez-Tejera

Modular Deep Learning

Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop…

Machine Learning · Computer Science 2024-01-30 Jonas Pfeiffer , Sebastian Ruder , Ivan Vulić , Edoardo Maria Ponti

Neural Networks with Few Multiplications

For most deep learning algorithms training is notoriously time consuming. Since most of the computation in training neural networks is typically spent on floating point multiplications, we investigate an approach to training that eliminates…

Machine Learning · Computer Science 2016-02-29 Zhouhan Lin , Matthieu Courbariaux , Roland Memisevic , Yoshua Bengio

Gradient-based Competitive Learning: Theory

Deep learning has been widely used for supervised learning and classification/regression problems. Recently, a novel area of research has applied this paradigm to unsupervised tasks; indeed, a gradient-based approach extracts, efficiently…

Machine Learning · Statistics 2020-09-08 Giansalvo Cirrincione , Pietro Barbiero , Gabriele Ciravegna , Vincenzo Randazzo

Training Deep Morphological Neural Networks as Universal Approximators

We investigate deep morphological neural networks (DMNNs). We demonstrate that despite their inherent non-linearity, "linear" activations are essential for DMNNs. To preserve their inherent sparsity, we propose architectures that constraint…

Machine Learning · Computer Science 2025-12-24 Konstantinos Fotopoulos , Petros Maragos

The Dual Form of Neural Networks Revisited: Connecting Test Time Predictions to Training Patterns via Spotlights of Attention

Linear layers in neural networks (NNs) trained by gradient descent can be expressed as a key-value memory system which stores all training datapoints and the initial weights, and produces outputs using unnormalised dot attention over the…

Machine Learning · Computer Science 2022-06-20 Kazuki Irie , Róbert Csordás , Jürgen Schmidhuber

Towards Understanding the Link Between Modularity and Performance in Neural Networks for Reinforcement Learning

Modularity has been widely studied as a mechanism to improve the capabilities of neural networks through various techniques such as hand-crafted modular architectures and automatic approaches. While these methods have sometimes shown…

Neural and Evolutionary Computing · Computer Science 2024-10-28 Humphrey Munn , Marcus Gallagher

Scalable Optimization in the Modular Norm

To improve performance in contemporary deep learning, one is interested in scaling up the neural network in terms of both the number and the size of the layers. When ramping up the width of a single layer, graceful scaling of training has…

Machine Learning · Computer Science 2024-05-24 Tim Large , Yang Liu , Minyoung Huh , Hyojin Bahng , Phillip Isola , Jeremy Bernstein

With Greater Distance Comes Worse Performance: On the Perspective of Layer Utilization and Model Generalization

Generalization of deep neural networks remains one of the main open problems in machine learning. Previous theoretical works focused on deriving tight bounds of model complexity, while empirical works revealed that neural networks exhibit…

Machine Learning · Computer Science 2022-01-31 James Wang , Cheng-Lin Yang

Multirate Training of Neural Networks

We propose multirate training of neural networks: partitioning neural network parameters into "fast" and "slow" parts which are trained on different time scales, where slow parts are updated less frequently. By choosing appropriate…

Machine Learning · Computer Science 2022-11-02 Tiffany Vlaar , Benedict Leimkuhler

Deep Bilevel Learning

We present a novel regularization approach to train neural networks that enjoys better generalization and test error than standard stochastic gradient descent. Our approach is based on the principles of cross-validation, where a validation…

Computer Vision and Pattern Recognition · Computer Science 2018-09-06 Simon Jenni , Paolo Favaro

Neural Network Optimization Reimagined: Decoupled Techniques for Scratch and Fine-Tuning

With the accumulation of resources in the era of big data and the rise of pre-trained models in deep learning, optimizing neural networks for various tasks often involves different strategies for fine-tuning pre-trained models versus…

Computer Vision and Pattern Recognition · Computer Science 2026-04-28 Xin Ning , Qiankun Li , Xiaolong Huang , Qiupu Chen , Feng He , Weijun Li , Prayag Tiwari , Xinwang Liu

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

In low-latency or mobile applications, lower computation complexity, lower memory footprint and better energy efficiency are desired. Many prior works address this need by removing redundant parameters. Parameter quantization replaces…

Machine Learning · Computer Science 2021-11-16 Cheng-Chou Lan