Related papers: Growing Neural Network with Shared Parameter
We develop an approach to efficiently grow neural networks, within which parameterization and optimization strategies are designed by considering their effects on the training dynamics. Unlike existing growing methods, which follow simple…
State-of-the-art parameter-efficient fine-tuning methods rely on introducing adapter modules between the layers of a pretrained language model. However, such modules are trained separately for each task and thus do not enable sharing…
This paper explores learned-context neural networks. It is a multi-task learning architecture based on a fully shared neural network and an augmented input vector containing trainable task parameters. The architecture is interesting due to…
Parameter sharing, as an important technique in multi-agent systems, can effectively solve the scalability issue in large-scale agent problems. However, the effectiveness of parameter sharing largely depends on the environment setting. When…
Most deep neural networks are trained under fixed network architectures and require retraining when the architecture changes. If expanding the network's size is needed, it is necessary to retrain from scratch, which is expensive. To avoid…
The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue…
This work demonstrates how increasing the number of neurons in a network without increasing its number of non-zero parameters improves performance. We show that this gain corresponds with a decrease in interference between multiple features…
Neural network based methods have obtained great progress on a variety of natural language processing tasks. However, in most previous works, the models are learned based on single-task supervised objectives, which often suffer from…
In this paper, we introduce a novel concept for learning of the parameters in a neural network. Our idea is grounded on modeling a learning problem that addresses a trade-off between (i) satisfying local objectives at each node and (ii)…
We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. Restricting the number of templates…
Training neural networks requires increasing amounts of memory. Parameter sharing can reduce memory and communication costs, but existing methods assume networks have many identical layers and utilize hand-crafted sharing strategies that…
We consider the problem of training a machine learning model over a network of nodes in a fully decentralized framework. The nodes take a Bayesian-like approach via the introduction of a belief over the model parameter space. We propose a…
Parameter sharing has proven to be a parameter-efficient approach. Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model…
Task-incremental learning involves the challenging problem of learning new tasks continually, without forgetting past knowledge. Many approaches address the problem by expanding the structure of a shared neural network as tasks arrive, but…
When fine-tuning Deep Neural Networks (DNNs) to new data, DNNs are prone to overwriting network parameters required for task-specific functionality on previously learned tasks, resulting in a loss of performance on those tasks. We propose…
In most applications of utilizing neural networks for mathematical optimization, a dedicated model is trained for each specific optimization objective. However, in many scenarios, several distinct yet correlated objectives or tasks often…
Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not…
In this paper we present an alternative strategy for fine-tuning the parameters of a network. We named the technique Gradual Tuning. Once trained on a first task, the network is fine-tuned on a second task by modifying a progressively…
This paper proposes a novel learning method for multi-task applications. Multi-task neural networks can learn to transfer knowledge across different tasks by using parameter sharing. However, sharing parameters between unrelated tasks can…
In multilingual neural machine translation, it has been shown that sharing a single translation model between multiple languages can achieve competitive performance, sometimes even leading to performance gains over bilingually trained…