Related papers: Online Hyperparameter Meta-Learning with Hypergrad…

Implicit differentiation for fast hyperparameter selection in non-smooth convex learning

Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques. In this work we study first-order methods when the inner optimization problem is convex but…

Machine Learning · Statistics 2022-08-10 Quentin Bertrand , Quentin Klopfenstein , Mathurin Massias , Mathieu Blondel , Samuel Vaiter , Alexandre Gramfort , Joseph Salmon

Gradient-based Hyperparameter Optimization Over Long Horizons

Gradient-based hyperparameter optimization has earned a widespread popularity in the context of few-shot meta-learning, but remains broadly impractical for tasks with long horizons (many gradient steps), due to memory scaling and gradient…

Machine Learning · Computer Science 2021-10-01 Paul Micaelli , Amos Storkey

Hierarchical Features Matter: A Deep Exploration of Progressive Parameterization Method for Dataset Distillation

Dataset distillation is an emerging dataset reduction method, which condenses large-scale datasets while maintaining task accuracy. Current parameterization methods achieve enhanced performance under extremely high compression ratio by…

Computer Vision and Pattern Recognition · Computer Science 2025-03-20 Xinhao Zhong , Hao Fang , Bin Chen , Xulin Gu , Meikang Qiu , Shuhan Qi , Shu-Tao Xia

Data-Efficient Ranking Distillation for Image Retrieval

Recent advances in deep learning has lead to rapid developments in the field of image retrieval. However, the best performing architectures incur significant computational cost. Recent approaches tackle this issue using knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Zakaria Laskar , Juho Kannala

DistPro: Searching A Fast Knowledge Distillation Process via Meta Optimization

Recent Knowledge distillation (KD) studies show that different manually designed schemes impact the learned results significantly. Yet, in KD, automatically searching an optimal distillation scheme has not yet been well explored. In this…

Computer Vision and Pattern Recognition · Computer Science 2022-04-13 Xueqing Deng , Dawei Sun , Shawn Newsam , Peng Wang

Online Deep Metric Learning via Mutual Distillation

Deep metric learning aims to transform input data into an embedding space, where similar samples are close while dissimilar samples are far apart from each other. In practice, samples of new categories arrive incrementally, which requires…

Computer Vision and Pattern Recognition · Computer Science 2022-03-11 Gao-Dong Liu , Wan-Lei Zhao , Jie Zhao

Online hyperparameter optimization by real-time recurrent learning

Conventional hyperparameter optimization methods are computationally intensive and hard to generalize to scenarios that require dynamically adapting hyperparameters, such as life-long learning. Here, we propose an online hyperparameter…

Machine Learning · Computer Science 2021-04-09 Daniel Jiwoong Im , Cristina Savin , Kyunghyun Cho

Incremental Object Detection via Meta-Learning

In a real-world setting, object instances from new classes can be continuously encountered by object detectors. When existing object detectors are applied to such scenarios, their performance on old classes deteriorates significantly. A few…

Computer Vision and Pattern Recognition · Computer Science 2021-12-16 K J Joseph , Jathushan Rajasegaran , Salman Khan , Fahad Shahbaz Khan , Vineeth N Balasubramanian

Knowledge Distillation via Route Constrained Optimization

Distillation-based learning boosts the performance of the miniaturized neural network based on the hypothesis that the representation of a teacher model can be used as structured and relatively weak supervision, and thus would be easily…

Machine Learning · Computer Science 2019-04-22 Xiao Jin , Baoyun Peng , Yichao Wu , Yu Liu , Jiaheng Liu , Ding Liang , Junjie Yan , Xiaolin Hu

$V_kD:$ Improving Knowledge Distillation using Orthogonal Projections

Knowledge distillation is an effective method for training small and efficient deep learning models. However, the efficacy of a single method can degenerate when transferring to other tasks, modalities, or even other architectures. To…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Roy Miles , Ismail Elezi , Jiankang Deng

Distill on the Go: Online knowledge distillation in self-supervised learning

Self-supervised learning solves pretext prediction tasks that do not require annotations to learn feature representations. For vision tasks, pretext tasks such as predicting rotation, solving jigsaw are solely created from the input data.…

Computer Vision and Pattern Recognition · Computer Science 2021-07-01 Prashant Bhat , Elahe Arani , Bahram Zonooz

Dataset Distillation as Pushforward Optimal Quantization

Dataset distillation aims to find a synthetic training set such that training on the synthetic data achieves similar performance to training on real data, with orders of magnitude less computational requirements. Existing methods can be…

Machine Learning · Computer Science 2026-02-09 Hong Ye Tan , Emma Slade

Distill2Vec: Dynamic Graph Representation Learning with Knowledge Distillation

Dynamic graph representation learning strategies are based on different neural architectures to capture the graph evolution over time. However, the underlying neural architectures require a large amount of parameters to train and suffer…

Machine Learning · Computer Science 2020-11-12 Stefanos Antaris , Dimitrios Rafailidis

UNDO: Understanding Distillation as Optimization

Knowledge distillation has emerged as an effective strategy for compressing large language models' (LLMs) knowledge into smaller, more efficient student models. However, standard one-shot distillation methods often produce suboptimal…

Computation and Language · Computer Science 2025-04-04 Kushal Jain , Piyushi Goyal , Kumar Shridhar

Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization

Semi-supervised learning (SSL) has emerged as a practical solution for addressing data scarcity challenges by leveraging unlabeled data. Recently, vision-language models (VLMs), pre-trained on massive image-text pairs, have demonstrated…

Machine Learning · Computer Science 2025-10-01 Seongjae Kang , Dong Bok Lee , Hyungjoon Jang , Sung Ju Hwang

Understanding the Generalization of Bilevel Programming in Hyperparameter Optimization: A Tale of Bias-Variance Decomposition

Gradient-based hyperparameter optimization (HPO) have emerged recently, leveraging bilevel programming techniques to optimize hyperparameter by estimating hypergradient w.r.t. validation loss. Nevertheless, previous theoretical works mainly…

Machine Learning · Computer Science 2026-02-23 Yubo Zhou , Jun Shu , Junmin Liu , Deyu Meng

BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach

Bilevel optimization (BO) is useful for solving a variety of important machine learning problems including but not limited to hyperparameter optimization, meta-learning, continual learning, and reinforcement learning. Conventional BO…

Machine Learning · Computer Science 2022-09-20 Mao Ye , Bo Liu , Stephen Wright , Peter Stone , Qiang Liu

Large scale distributed neural network training through online distillation

Techniques such as ensembling and distillation promise model quality improvements when paired with almost any base model. However, due to increased test-time cost (for ensembles) and increased complexity of the training pipeline (for…

Machine Learning · Computer Science 2020-08-24 Rohan Anil , Gabriel Pereyra , Alexandre Passos , Robert Ormandi , George E. Dahl , Geoffrey E. Hinton

Multi-objective hybrid knowledge distillation for efficient deep learning in smart agriculture

Deploying deep learning models on resource-constrained edge devices remains a major challenge in smart agriculture due to the trade-off between computational efficiency and recognition accuracy. To address this challenge, this study…

Computer Vision and Pattern Recognition · Computer Science 2025-12-30 Phi-Hung Hoang , Nam-Thuan Trinh , Van-Manh Tran , Thi-Thu-Hong Phan

Hands-on Guidance for Distilling Object Detectors

Knowledge distillation can lead to deploy-friendly networks against the plagued computational complexity problem, but previous methods neglect the feature hierarchy in detectors. Motivated by this, we propose a general framework for…

Computer Vision and Pattern Recognition · Computer Science 2021-05-13 Yangyang Qin , Hefei Ling , Zhenghai He , Yuxuan Shi , Lei Wu