Related papers: Distilling Visual Priors from Self-Supervised Lear…

Distilling Vision Transformers for Distortion-Robust Representation Learning

Self-supervised learning has achieved remarkable success in learning visual representations from clean data, yet remains challenging when clean observations are sparse or not available at all. In this paper, we demonstrate that pretrained…

Computer Vision and Pattern Recognition · Computer Science 2026-04-27 Konstantinos Alexis , Giorgos Giannopoulos , Dimitrios Gunopulos

Self-Distilled Self-Supervised Representation Learning

State-of-the-art frameworks in self-supervised learning have recently shown that fully utilizing transformer-based models can lead to performance boost compared to conventional CNN models. Striving to maximize the mutual information of two…

Computer Vision and Pattern Recognition · Computer Science 2022-11-29 Jiho Jang , Seonhoon Kim , Kiyoon Yoo , Chaerin Kong , Jangho Kim , Nojun Kwak

Self-supervised Knowledge Distillation for Few-shot Learning

Real-world contains an overwhelmingly large number of object classes, learning all of which at once is infeasible. Few shot learning is a promising learning paradigm due to its ability to learn out of order distributions quickly with only a…

Computer Vision and Pattern Recognition · Computer Science 2020-08-05 Jathushan Rajasegaran , Salman Khan , Munawar Hayat , Fahad Shahbaz Khan , Mubarak Shah

Fast and Accurate Single Image Super-Resolution via Information Distillation Network

Recently, deep convolutional neural networks (CNNs) have been demonstrated remarkable progress on single image super-resolution. However, as the depth and width of the networks increase, CNN-based super-resolution methods have been faced…

Computer Vision and Pattern Recognition · Computer Science 2018-03-28 Zheng Hui , Xiumei Wang , Xinbo Gao

Training convolutional neural networks with cheap convolutions and online distillation

The large memory and computation consumption in convolutional neural networks (CNNs) has been one of the main barriers for deploying them on resource-limited systems. To this end, most cheap convolutions (e.g., group convolution, depth-wise…

Computer Vision and Pattern Recognition · Computer Science 2019-10-11 Jiao Xie , Shaohui Lin , Yichen Zhang , Linkai Luo

Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation

Convolutional neural networks have been widely deployed in various application scenarios. In order to extend the applications' boundaries to some accuracy-crucial domains, researchers have been investigating approaches to boost accuracy…

Machine Learning · Computer Science 2019-05-21 Linfeng Zhang , Jiebo Song , Anni Gao , Jingwei Chen , Chenglong Bao , Kaisheng Ma

Learning with Privileged Information for Efficient Image Super-Resolution

Convolutional neural networks (CNNs) have allowed remarkable advances in single image super-resolution (SISR) over the last decade. Most SR methods based on CNNs have focused on achieving performance gains in terms of quality metrics, such…

Computer Vision and Pattern Recognition · Computer Science 2020-07-16 Wonkyung Lee , Junghyup Lee , Dohyung Kim , Bumsub Ham

Better and Faster: Knowledge Transfer from Multiple Self-supervised Learning Tasks via Graph Distillation for Video Classification

Video representation learning is a vital problem for classification task. Recently, a promising unsupervised paradigm termed self-supervised learning has emerged, which explores inherent supervisory signals implied in massive data for…

Computer Vision and Pattern Recognition · Computer Science 2018-04-27 Chenrui Zhang , Yuxin Peng

SimReg: Regression as a Simple Yet Effective Tool for Self-supervised Knowledge Distillation

Feature regression is a simple way to distill large neural network models to smaller ones. We show that with simple changes to the network architecture, regression can outperform more complex state-of-the-art approaches for knowledge…

Computer Vision and Pattern Recognition · Computer Science 2022-01-14 K L Navaneet , Soroush Abbasi Koohpayegani , Ajinkya Tejankar , Hamed Pirsiavash

Regularizing Class-wise Predictions via Self-knowledge Distillation

Deep neural networks with millions of parameters may suffer from poor generalization due to overfitting. To mitigate the issue, we propose a new regularization method that penalizes the predictive distribution between similar samples. In…

Machine Learning · Computer Science 2020-04-08 Sukmin Yun , Jongjin Park , Kimin Lee , Jinwoo Shin

Self-Supervised Models are Continual Learners

Self-supervised models have been shown to produce comparable or better visual representations than their supervised counterparts when trained offline on unlabeled data at scale. However, their efficacy is catastrophically reduced in a…

Computer Vision and Pattern Recognition · Computer Science 2022-04-04 Enrico Fini , Victor G. Turrisi da Costa , Xavier Alameda-Pineda , Elisa Ricci , Karteek Alahari , Julien Mairal

Competitive Distillation: A Simple Learning Strategy for Improving Visual Classification

Deep Neural Networks (DNNs) have significantly advanced the field of computer vision. To improve DNN training process, knowledge distillation methods demonstrate their effectiveness in accelerating network training by introducing a fixed…

Computer Vision and Pattern Recognition · Computer Science 2025-07-01 Daqian Shi , Xiaolei Diao , Xu Chen , Cédric M. John

Knowledge Distillation Circumvents Nonlinearity for Optical Convolutional Neural Networks

In recent years, Convolutional Neural Networks (CNNs) have enabled ubiquitous image processing applications. As such, CNNs require fast runtime (forward propagation) to process high-resolution visual streams in real time. This is still a…

Computer Vision and Pattern Recognition · Computer Science 2022-03-23 Jinlin Xiang , Shane Colburn , Arka Majumdar , Eli Shlizerman

Distilling Efficient Vision Transformers from CNNs for Semantic Segmentation

In this paper, we tackle a new problem: how to transfer knowledge from the pre-trained cumbersome yet well-performed CNN-based model to learn a compact Vision Transformer (ViT)-based model while maintaining its learning capacity? Due to the…

Computer Vision and Pattern Recognition · Computer Science 2023-10-12 Xu Zheng , Yunhao Luo , Pengyuan Zhou , Lin Wang

Self-Distilled Representation Learning for Time Series

Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision. While most existing works in this area focus on contrastive learning, we propose a…

Machine Learning · Computer Science 2023-11-21 Felix Pieper , Konstantin Ditschuneit , Martin Genzel , Alexandra Lindt , Johannes Otterbach

Randomly Initialized Networks Can Learn from Peer-to-Peer Consensus

In self-supervised learning, self-distilled methods have shown impressive performance, learning representations useful for downstream tasks and even displaying emergent properties. However, state-of-the-art methods usually rely on ensembles…

Machine Learning · Computer Science 2026-05-01 Esteban Rodríguez-Betancourt , Edgar Casasola-Murillo

Low-resolution Face Recognition in the Wild via Selective Knowledge Distillation

Typically, the deployment of face recognition models in the wild needs to identify low-resolution faces with extremely low computational cost. To address this problem, a feasible solution is compressing a complex face model to achieve…

Computer Vision and Pattern Recognition · Computer Science 2019-03-14 Shiming Ge , Shengwei Zhao , Chenyu Li , Jia Li

Dataset Distillation for Pre-Trained Self-Supervised Vision Models

The task of dataset distillation aims to find a small set of synthetic images such that training a model on them reproduces the performance of the same model trained on a much larger dataset of real samples. Existing distillation methods…

Computer Vision and Pattern Recognition · Computer Science 2025-11-21 George Cazenavette , Antonio Torralba , Vincent Sitzmann

Knowledge Distillation in Iterative Generative Models for Improved Sampling Speed

Iterative generative models, such as noise conditional score networks and denoising diffusion probabilistic models, produce high quality samples by gradually denoising an initial noise vector. However, their denoising process has many…

Machine Learning · Computer Science 2021-01-08 Eric Luhman , Troy Luhman

Class-Discriminative CNN Compression

Compressing convolutional neural networks (CNNs) by pruning and distillation has received ever-increasing focus in the community. In particular, designing a class-discrimination based approach would be desired as it fits seamlessly with the…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Yuchen Liu , David Wentzlaff , S. Y. Kung