Related papers: Growing Neural Network with Shared Parameter

Accelerated Training via Incrementally Growing Neural Networks using Variance Transfer and Learning Rate Adaptation

We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple…

Machine Learning · Computer Science 2023-06-23 Xin Yuan , Pedro Savarese , Michael Maire

Parameter-efficient Multi-task Fine-tuning for Transformers via Shared Hypernetworks

State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing…

Computation and Language · Computer Science 2021-06-09 Rabeeh Karimi Mahabadi , Sebastian Ruder , Mostafa Dehghani , James Henderson

Multi-task neural networks by learned contextual inputs

This paper explores learned-context neural networks. It is a multi-task learning architecture based on a fully shared neural network and an augmented input vector containing trainable task parameters. The architecture is interesting due to…

Machine Learning · Computer Science 2025-08-07 Anders T. Sandnes , Bjarne Grimstad , Odd Kolbjørnsen

Adaptive parameter sharing for multi-agent reinforcement learning

Parameter sharing, as an important technique in multi-agent systems, can effectively solve the scalability issue in large-scale agent problems. However, the effectiveness of parameter sharing largely depends on the environment setting. When…

Artificial Intelligence · Computer Science 2025-03-04 Dapeng Li , Na Lou , Bin Zhang , Zhiwei Xu , Guoliang Fan

MixtureGrowth: Growing Neural Networks by Recombining Learned Parameters

Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes. If expanding the network's size is needed, it is necessary to retrain from scratch, which is expensive. To avoid…

Machine Learning · Computer Science 2023-11-09 Chau Pham , Piotr Teterwak , Soren Nelson , Bryan A. Plummer

Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training

The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Shuai Zhao , Liguang Zhou , Wenxiao Wang , Deng Cai , Tin Lun Lam , Yangsheng Xu

Expand Neurons, Not Parameters

This work demonstrates how increasing the number of neurons in a network without increasing its number of non-zero parameters improves performance. We show that this gain corresponds with a decrease in interference between multiple features…

Machine Learning · Computer Science 2025-10-07 Linghao Kong , Inimai Subramanian , Yonadav Shavit , Micah Adler , Dan Alistarh , Nir Shavit

Recurrent Neural Network for Text Classification with Multi-Task Learning

Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervised objectives, which often suffer from…

Computation and Language · Computer Science 2016-05-18 Pengfei Liu , Xipeng Qiu , Xuanjing Huang

Network Parameter Learning Using Nonlinear Transforms, Local Representation Goals and Local Propagation Constraints

In this paper, we introduce a novel concept for learning of the parameters in a neural network. Our idea is grounded on modeling a learning problem that addresses a trade-off between (i) satisfying local objectives at each node and (ii)…

Machine Learning · Computer Science 2019-02-04 Dimche Kostadinov , Behrooz Razdehi , Slava Voloshynovskiy

Learning Implicitly Recurrent CNNs Through Parameter Sharing

We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. Restricting the number of templates…

Machine Learning · Computer Science 2019-03-15 Pedro Savarese , Michael Maire

Neural Parameter Allocation Search

Training neural networks requires increasing amounts of memory. Parameter sharing can reduce memory and communication costs, but existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that…

Machine Learning · Computer Science 2022-03-17 Bryan A. Plummer , Nikoli Dryden , Julius Frost , Torsten Hoefler , Kate Saenko

Peer-to-peer Federated Learning on Graphs

We consider the problem of training a machine learning model over a network of nodes in a fully decentralized framework. The nodes take a Bayesian-like approach via the introduction of a belief over the model parameter space. We propose a…

Machine Learning · Computer Science 2019-02-01 Anusha Lalitha , Osman Cihan Kilinc , Tara Javidi , Farinaz Koushanfar

Understanding Parameter Sharing in Transformers

Parameter sharing has proven to be a parameter-efficient approach. Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model…

Machine Learning · Computer Science 2023-06-19 Ye Lin , Mingxuan Wang , Zhexi Zhang , Xiaohui Wang , Tong Xiao , Jingbo Zhu

Learn to Bind and Grow Neural Structures

Task-incremental learning involves the challenging problem of learning new tasks continually, without forgetting past knowledge. Many approaches address the problem by expanding the structure of a shared neural network as tasks arrive, but…

Machine Learning · Computer Science 2020-11-24 Azhar Shaikh , Nishant Sinha

Dynamic Continual Learning: Harnessing Parameter Uncertainty for Improved Network Adaptation

When fine-tuning Deep Neural Networks (DNNs) to new data, DNNs are prone to overwriting network parameters required for task-specific functionality on previously learned tasks, resulting in a loss of performance on those tasks. We propose…

Machine Learning · Computer Science 2025-01-22 Christopher Angelini , Nidhal Bouaynaya

Transfer Learning with Reconstruction Loss

In most applications of utilizing neural networks for mathematical optimization, a dedicated model is trained for each specific optimization objective. However, in many scenarios, several distinct yet correlated objectives or tasks often…

Machine Learning · Computer Science 2024-04-15 Wei Cui , Wei Yu

Learning Sparse Sharing Architectures for Multiple Tasks

Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not…

Computation and Language · Computer Science 2019-11-19 Tianxiang Sun , Yunfan Shao , Xiaonan Li , Pengfei Liu , Hang Yan , Xipeng Qiu , Xuanjing Huang

Gradual Tuning: a better way of Fine Tuning the parameters of a Deep Neural Network

In this paper we present an alternative strategy for fine-tuning the parameters of a network. We named the technique Gradual Tuning. Once trained on a first task, the network is fine-tuned on a second task by modifying a progressively…

Artificial Intelligence · Computer Science 2017-11-29 Guglielmo Montone , J. Kevin O'Regan , Alexander V. Terekhov

Flexible Multi-task Networks by Learning Parameter Allocation

This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can…

Machine Learning · Computer Science 2020-07-21 Krzysztof Maziarz , Efi Kokiopoulou , Andrea Gesmundo , Luciano Sbaiz , Gabor Bartok , Jesse Berent

Parameter Sharing Methods for Multilingual Self-Attentional Translation Models

In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained…

Computation and Language · Computer Science 2018-09-14 Devendra Singh Sachan , Graham Neubig