English
Related papers

Related papers: Simplifying DINO via Coding Rate Regularization

200 papers

Training AI models to understand images without costly labeled data remains a challenge. We combine two techniques--DINO (teacher-student learning) and Barlow Twins (redundancy reduction)--to create a model that learns better with fewer…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Michael Podsiadly , Brendon K Lay

Large-scale vision foundation models such as DINOv2 boast impressive performances by leveraging massive architectures and training datasets. But numerous scenarios require practitioners to reproduce those pre-training solutions, such as on…

Computer Vision and Pattern Recognition · Computer Science 2026-01-30 Jiaqi Zhang , Juntuo Wang , Zhixin Sun , John Zou , Randall Balestriero

World models learned from high-dimensional visual observations allow agents to make decisions and plan directly in latent space, avoiding pixel-level reconstruction. However, recent latent predictive architectures (JEPAs), including the…

Machine Learning · Computer Science 2026-02-25 Leonardo F. Toso , Davit Shadunts , Yunyang Lu , Nihal Sharma , Donglin Zhan , Nam H. Nguyen , James Anderson

Although visual foundation models like DINOv2 provide state-of-the-art performance as feature extractors, their complex, high-dimensional representations create substantial hurdles for interpretability. This work proposes DINO-QPM, which…

Computer Vision and Pattern Recognition · Computer Science 2026-04-09 Robert Zimmermann , Thomas Norrenbrock , Bodo Rosenhahn

Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between $K$-dimensional probability vectors, obtained by applying a softmax function to the…

Machine Learning · Computer Science 2024-05-20 Hariprasath Govindarajan , Per Sidén , Jacob Roll , Fredrik Lindsten

We propose derivative-informed neural operators (DINOs), a general family of neural networks to approximate operators as infinite-dimensional mappings from input function spaces to output function spaces or quantities of interest. After…

Numerical Analysis · Mathematics 2023-10-18 Thomas O'Leary-Roseberry , Peng Chen , Umberto Villa , Omar Ghattas

Object-centric understanding is fundamental to human vision and required for complex reasoning. Traditional methods define slot-based bottlenecks to learn object properties explicitly, while recent self-supervised vision models like DINO…

Computer Vision and Pattern Recognition · Computer Science 2025-10-03 Stefan Sylvius Wagner , Stefan Harmeling

Nowadays deep learning-based methods have achieved a remarkable progress at the image classification task among a wide range of commonly used datasets (ImageNet, CIFAR, SVHN, Caltech 101, SUN397, etc.). SOTA performance on each of the…

Computer Vision and Pattern Recognition · Computer Science 2021-08-17 Kirill Prokofiev , Vladislav Sovrasov

Complex design problems are common in the scientific and industrial fields. In practice, objective functions or constraints of these problems often do not have explicit formulas, and can be estimated only at a set of sampling points through…

Optimization and Control · Mathematics 2022-10-12 Lulu Zhang , Zhi-Qin John Xu , Yaoyu Zhang

Medical image analysis frequently encounters data scarcity challenges. Transfer learning has been effective in addressing this issue while conserving computational resources. The recent advent of foundational models like the DINOv2, which…

Image and Video Processing · Electrical Eng. & Systems 2024-02-14 Yuning Huang , Jingchen Zou , Lanxi Meng , Xin Yue , Qing Zhao , Jianqiang Li , Changwei Song , Gabriel Jimenez , Shaowu Li , Guanghui Fu

Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being tailored to specific tasks or domains, this…

Vision Foundation Models (VFMs) have advanced representation learning through self-supervised methods. However, existing training pipelines are often inflexible, domain-specific, or computationally expensive, which limits their usability…

Computer Vision and Pattern Recognition · Computer Science 2025-11-04 Mahmut Selman Gokmen , Cody Bumgardner

Purpose: Depth estimation in robotic surgery is vital in 3D reconstruction, surgical navigation and augmented reality visualization. Although the foundation model exhibits outstanding performance in many vision tasks, including depth…

Computer Vision and Pattern Recognition · Computer Science 2024-01-15 Beilei Cui , Mobarakol Islam , Long Bai , Hongliang Ren

The integration of deep learning systems into healthcare has been hindered by the resource-intensive process of data annotation and the inability of these systems to generalize to different data distributions. Foundation models, which are…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Mohammed Baharoon , Waseem Qureshi , Jiahong Ouyang , Yanwu Xu , Abdulrhman Aljouie , Wei Peng

A prominent self-supervised learning paradigm is to model the representations as clusters, or more generally as a mixture model. Learning to map the data samples to compact representations and fitting the mixture model simultaneously leads…

Machine Learning · Computer Science 2024-10-21 Hariprasath Govindarajan , Per Sidén , Jacob Roll , Fredrik Lindsten

Recent advances in self-supervised learning (SSL) have made it possible to learn general-purpose visual features that capture both the high-level semantics and the fine-grained spatial structure of images. Most notably, the recent DINOv2…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Mattia Scardecchia

The recent explosive interest in the reasoning capabilities of large language models, such as DeepSeek-R1, has demonstrated remarkable success through reinforcement learning-based fine-tuning frameworks, exemplified by methods like Group…

Computer Vision and Pattern Recognition · Computer Science 2025-08-04 Chenbin Pan , Wenbin He , Zhengzhong Tu , Liu Ren

Medical image registration is a critical component of clinical imaging workflows, enabling accurate longitudinal assessment, multi-modal data fusion, and image-guided interventions. Intensity-based approaches often struggle with…

Computer Vision and Pattern Recognition · Computer Science 2026-03-17 Eytan Kats , Mattias P. Heinrich

Self-supervised visual foundation models produce powerful embeddings that achieve remarkable performance on a wide range of downstream tasks. However, unlike vision-language models such as CLIP, self-supervised visual features are not…

Imitation learning has proven to be a powerful tool for training complex visuomotor policies. However, current methods often require hundreds to thousands of expert demonstrations to handle high-dimensional visual observations. A key reason…

Robotics · Computer Science 2024-11-01 Zichen Jeff Cui , Hengkai Pan , Aadhithya Iyer , Siddhant Haldar , Lerrel Pinto
‹ Prev 1 2 3 10 Next ›