Related papers: Simplifying DINO via Coding Rate Regularization

DinoTwins: Combining DINO and Barlow Twins for Robust, Label-Efficient Vision Transformers

Training AI models to understand images without costly labeled data remains a challenge. We combine two techniques--DINO (teacher-student learning) and Barlow Twins (redundancy reduction)--to create a model that learns better with fewer…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Michael Podsiadly , Brendon K Lay

FastDINOv2: Frequency Based Curriculum Learning Improves Robustness and Training Speed

Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Jiaqi Zhang , Juntuo Wang , Zhixin Sun , John Zou , Randall Balestriero

Learning Invariant Visual Representations for Planning with Joint-Embedding Predictive World Models

World models learned from high-dimensional visual observations allow agents to make decisions and plan directly in latent space, avoiding pixel-level reconstruction. However, recent latent predictive architectures (JEPAs), including the…

Machine Learning · Computer Science 2026-02-25 Leonardo F. Toso , Davit Shadunts , Yunyang Lu , Nihal Sharma , Donglin Zhan , Nam H. Nguyen , James Anderson

DINO-QPM: Adapting Visual Foundation Models for Globally Interpretable Image Classification

Although visual foundation models like DINOv2 provide state-of-the-art performance as feature extractors, their complex, high-dimensional representations create substantial hurdles for interpretability. This work proposes DINO-QPM, which…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Robert Zimmermann , Thomas Norrenbrock , Bodo Rosenhahn

DINO as a von Mises-Fisher mixture model

Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between $K$-dimensional probability vectors, obtained by applying a softmax function to the…

Machine Learning · Computer Science 2024-05-20 Hariprasath Govindarajan , Per Sidén , Jacob Roll , Fredrik Lindsten

Derivative-Informed Neural Operator: An Efficient Framework for High-Dimensional Parametric Derivative Learning

We propose derivative-informed neural operators (DINOs), a general family of neural networks to approximate operators as infinite-dimensional mappings from input function spaces to output function spaces or quantities of interest. After…

Numerical Analysis · Mathematics 2023-10-18 Thomas O'Leary-Roseberry , Peng Chen , Umberto Villa , Omar Ghattas

Oh-A-DINO: Understanding and Enhancing Attribute-Level Information in Self-Supervised Object-Centric Representations

Object-centric understanding is fundamental to human vision and required for complex reasoning. Traditional methods define slot-based bottlenecks to learn object properties explicitly, while recent self-supervised vision models like DINO…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Stefan Sylvius Wagner , Stefan Harmeling

Towards Efficient and Data Agnostic Image Classification Training Pipeline for Embedded Systems

Nowadays deep learning-based methods have achieved a remarkable progress at the image classification task among a wide range of commonly used datasets (ImageNet, CIFAR, SVHN, Caltech 101, SUN397, etc.). SOTA performance on each of the…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Kirill Prokofiev , Vladislav Sovrasov

Data-informed Deep Optimization

Complex design problems are common in the scientific and industrial fields. In practice, objective functions or constraints of these problems often do not have explicit formulas, and can be estimated only at a set of sampling points through…

Optimization and Control · Mathematics 2022-10-12 Lulu Zhang , Zhi-Qin John Xu , Yaoyu Zhang

Comparative Analysis of ImageNet Pre-Trained Deep Learning Models and DINOv2 in Medical Imaging Classification

Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which…

Image and Video Processing · Electrical Eng. & Systems 2024-02-14 Yuning Huang , Jingchen Zou , Lanxi Meng , Xin Yue , Qing Zhao , Jianqiang Li , Changwei Song , Gabriel Jimenez , Shaowu Li , Guanghui Fu

DINOv3

Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being tailored to specific tasks or domains, this…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Oriane Siméoni , Huy V. Vo , Maximilian Seitzer , Federico Baldassarre , Maxime Oquab , Cijo Jose , Vasil Khalidov , Marc Szafraniec , Seungeun Yi , Michaël Ramamonjisoa , Francisco Massa , Daniel Haziza , Luca Wehrstedt , Jianyuan Wang , Timothée Darcet , Théo Moutakanni , Leonel Sentana , Claire Roberts , Andrea Vedaldi , Jamie Tolan , John Brandt , Camille Couprie , Julien Mairal , Hervé Jégou , Patrick Labatut , Piotr Bojanowski

DINO-MX: A Modular & Flexible Framework for Self-Supervised Learning

Vision Foundation Models (VFMs) have advanced representation learning through self-supervised methods. However, existing training pipelines are often inflexible, domain-specific, or computationally expensive, which limits their usability…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Mahmut Selman Gokmen , Cody Bumgardner

Surgical-DINO: Adapter Learning of Foundation Models for Depth Estimation in Endoscopic Surgery

Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth…

Computer Vision and Pattern Recognition · Computer Science 2024-01-15 Beilei Cui , Mobarakol Islam , Long Bai , Hongliang Ren

Evaluating General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks

The integration of deep learning systems into healthcare has been hindered by the resource-intensive process of data annotation and the inability of these systems to generalize to different data distributions. Foundation models, which are…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Mohammed Baharoon , Waseem Qureshi , Jiahong Ouyang , Yanwu Xu , Abdulrhman Aljouie , Wei Peng

On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods

A prominent self-supervised learning paradigm is to model the representations as clusters, or more generally as a mixture model. Learning to map the data samples to compact representations and fitting the mixture model simultaneously leads…

Machine Learning · Computer Science 2024-10-21 Hariprasath Govindarajan , Per Sidén , Jacob Roll , Fredrik Lindsten

Unsupervised Transformer Pre-Training for Images: Self-Distillation, Mean Teachers, and Random Crops

Recent advances in self-supervised learning (SSL) have made it possible to learn general-purpose visual features that capture both the high-level semantics and the fine-grained spatial structure of images. Most notably, the recent DINOv2…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Mattia Scardecchia

DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models

The recent explosive interest in the reasoning capabilities of large language models, such as DeepSeek-R1, has demonstrated remarkable success through reinforcement learning-based fine-tuning frameworks, exemplified by methods like Group…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Chenbin Pan , Wenbin He , Zhengzhong Tu , Liu Ren

Effective Feature Learning for 3D Medical Registration via Domain-Specialized DINO Pretraining

Medical image registration is a critical component of clinical imaging workflows, enabling accurate longitudinal assessment, multi-modal data fusion, and image-guided interventions. Intensity-based approaches often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Eytan Kats , Mattias P. Heinrich

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Self-supervised visual foundation models produce powerful embeddings that achieve remarkable performance on a wide range of downstream tasks. However, unlike vision-language models such as CLIP, self-supervised visual features are not…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Cijo Jose , Théo Moutakanni , Dahyun Kang , Federico Baldassarre , Timothée Darcet , Hu Xu , Daniel Li , Marc Szafraniec , Michaël Ramamonjisoa , Maxime Oquab , Oriane Siméoni , Huy V. Vo , Patrick Labatut , Piotr Bojanowski

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason…

Robotics · Computer Science 2024-11-01 Zichen Jeff Cui , Hengkai Pan , Aadhithya Iyer , Siddhant Haldar , Lerrel Pinto