Related papers: Modularizing Deep Learning via Pairwise Learning W…
Intermediate features at different layers of a deep neural network are known to be discriminative for visual patterns of different complexities. However, most existing works ignore such cross-layer heterogeneities when classifying samples…
Transfer learning has recently become the dominant paradigm of machine learning. Pre-trained models fine-tuned for downstream tasks achieve better performance with fewer labelled examples. Nonetheless, it remains unclear how to develop…
An approach to improve network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We find pretrained models to be highly unclusterable and thus train models to be more…
State-of-the-art, high capacity deep neural networks not only require large amounts of labelled training data, they are also highly susceptible to label errors in this data, typically resulting in large efforts and costs and therefore…
We introduce collaborative learning in which multiple classifier heads of the same network are simultaneously trained on the same training data to improve generalization and robustness to label noise with no extra inference cost. It…
Scaling model capacity has been vital in the success of deep learning. For a typical network, necessary compute resources and training time grow dramatically with model size. Conditional computation is a promising way to increase the number…
Deep learning has shown promising results in many machine learning applications. The hierarchical feature representation built by deep networks enable compact and precise encoding of the data. A kernel analysis of the trained deep networks…
How can neural networks such as ResNet efficiently learn CIFAR-10 with test accuracy more than 96%, while other methods, especially kernel methods, fall relatively behind? Can we more provide theoretical justifications for this gap?…
We propose a novel family of connectionist models based on kernel machines and consider the problem of learning layer-by-layer a compositional hypothesis class, i.e., a feedforward, multilayer architecture, in a supervised setting. In terms…
We consider supervised learning with $n$ labels and show that layerwise SGD on residual networks can efficiently learn a class of hierarchical models. This model class assumes the existence of an (unknown) label hierarchy $L_1 \subseteq L_2…
Deep learning methods have predominantly been applied to large artificial neural networks. Despite their state-of-the-art performance, these large networks typically do not generalize well to datasets with limited sample sizes. In this…
Traditional deep network training methods optimize a monolithic objective function jointly for all the components. This can lead to various inefficiencies in terms of potential parallelization. Local learning is an approach to…
The task of labeling data for training deep neural networks is daunting and tedious, requiring millions of labels to achieve the current state-of-the-art results. Such reliance on large amounts of labeled data can be relaxed by exploiting…
Multi-label recognition is a fundamental, and yet is a challenging task in computer vision. Recently, deep learning models have achieved great progress towards learning discriminative features from input images. However, conventional…
In multi-label classification, the main focus has been to develop ways of learning the underlying dependencies between labels, and to take advantage of this at classification time. Developing better feature-space representations has been…
The empirical success of deep learning is often attributed to deep networks' ability to exploit hierarchical structure in data, constructing increasingly complex features across layers. Yet despite substantial progress in deep learning…
Deep kernel learning provides an elegant and principled framework for combining the structural properties of deep learning algorithms with the flexibility of kernel methods. By means of a deep neural network, we learn a parametrized kernel…
Deep convolutional neural networks (CNNs) have demonstrated remarkable success in computer vision by supervisedly learning strong visual feature representations. However, training CNNs relies heavily on the availability of exhaustive…
Complex image processing and computer vision systems often consist of a processing pipeline of functional modules. We intend to replace parts or all of a target pipeline with deep neural networks to achieve benefits such as increased…
This work presents a new strategy for multi-class classification that requires no class-specific labels, but instead leverages pairwise similarity between examples, which is a weaker form of annotation. The proposed method, meta…