Related papers: Rethinking Parameter Sharing as Graph Coloring for…
Hard parameter sharing in multi-domain learning (MDL) allows domains to share some of the model parameters to reduce storage cost while improving prediction accuracy. One common sharing practice is to share the bottom layers of a deep…
Parameter sharing has proven to be a parameter-efficient approach. Previous work on Transformers has focused on sharing parameters in different layers, which can improve the performance of models with limited parameters by increasing model…
We introduce a parameter sharing scheme, in which different layers of a convolutional neural network (CNN) are defined by a learned linear combination of parameter tensors from a global bank of templates. Restricting the number of templates…
Multi-task networks rely on effective parameter sharing to achieve robust generalization across tasks. In this paper, we present a novel parameter sharing method for multi-task learning that conditions parameter sharing on both the task and…
Most existing deep multi-task learning models are based on parameter sharing, such as hard sharing, hierarchical sharing, and soft sharing. How choosing a suitable sharing mechanism depends on the relations among the tasks, which is not…
Color coding is an algorithmic technique used in parameterized complexity theory to detect "small" structures inside graphs. The idea is to derandomize algorithms that first randomly color a graph and then search for an easily-detectable,…
Modern convolutional neural networks (CNNs) have massive identical convolution blocks, and, hence, recursive sharing of parameters across these blocks has been proposed to reduce the amount of parameters. However, naive sharing of…
Deep convolutional neural networks (DCNNs) have become the state-of-the-art (SOTA) approach for many computer vision tasks: image classification, object detection, semantic segmentation, etc. However, most SOTA networks are too large for…
Symbolic regression (SR) aims to find symbolic expressions that describe datasets. Due to its inherent interpretability, is a powerful paradigm for scientific discovery. Recent advances have expanded SR to describe related phenomena using a…
We propose a parameter sharing method for Transformers (Vaswani et al., 2017). The proposed approach relaxes a widely used technique, which shares parameters for one layer with all layers such as Universal Transformers (Dehghani et al.,…
We describe a rational approach to reduce the computational and communication complexities of lossless point-to-point compression for computation with side information. The traditional method relies on building a characteristic graph with…
Probabilistic graphical models that encode indistinguishable objects and relations among them use first-order logic constructs to compress a propositional factorised model for more efficient (lifted) inference. To obtain a lifted…
Transformers have shown improved performance when compared to previous architectures for sequence processing such as RNNs. Despite their sizeable performance gains, as recently suggested, the model is computationally expensive to train and…
The ever-increasing parameter counts of deep learning models necessitate effective compression techniques for deployment on resource-constrained devices. This paper explores the application of information geometry, the study of…
A Knowledge Graph (KG) is the directed graphical representation of entities and relations in the real world. KG can be applied in diverse Natural Language Processing (NLP) tasks where knowledge is required. The need to scale up and complete…
Large neural networks achieve state-of-the-art performance on many tasks, yet their sheer size hinders deployment on resource-constrained devices. Among existing compression approaches, cross-layer parameter sharing remains relatively…
Learning over sparse, high-dimensional data frequently necessitates the use of specialized methods such as the hashing trick. In this work, we design a highly scalable alternative approach that leverages the low degree of feature…
Graph workloads pose a particularly challenging problem for query optimizers. They typically feature large queries made up of entirely many-to-many joins with complex correlations. This puts significant stress on traditional cardinality…
The compression of geometric structures is a relatively new field of data compression. Since about 1995, several articles have dealt with the coding of meshes, using for most of them the following approach: the vertices of the mesh are…
In recent years, data dimensionality has increasingly become a concern, leading to many parameter and dimension reduction techniques being proposed in the literature. A parameter-wise co-clustering model, for data modelled via continuous…