Patrick Labatut — Scifaro

Misalignment Between Backpropagation and the Hierarchy of Brain Responses to Images

Backpropagation is the core learning mechanism underlying deep learning. However, whether and how this algorithm is implemented in the brain remains highly debated. In particular, while forward activations of pretrained models reliably map…

Neurons and Cognition · Quantitative Biology 2026-05-28 Joséphine Raugel , Maximilian Seitzer , Marc Szafraniec , Huy V. Vo , Jérémy Rapin , Patrick Labatut , Piotr Bojanowski , Valentin Wyart , Jean-Rémi King

VGGT-$\Omega$

Recent feed-forward reconstruction models, such as VGGT, have proven competitive with traditional optimization-based reconstructors while also providing geometry-aware features useful for other tasks. Here, we show that the quality of these…

Computer Vision and Pattern Recognition · Computer Science 2026-05-15 Jianyuan Wang , Minghao Chen , Shangzhan Zhang , Nikita Karaev , Johannes Schönberger , Patrick Labatut , Piotr Bojanowski , David Novotny , Andrea Vedaldi , Christian Rupprecht

AdaVFM: Adaptive Vision Foundation Models for Edge Intelligence via LLM-Guided Execution

Language-aligned vision foundation models (VFMs) enable versatile visual understanding for always-on contextual AI, but their deployment on edge devices is hindered by strict latency and power constraints. We present AdaVFM, an adaptive…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Yiwei Zhao , Yi Zheng , Huapeng Su , Jieyu Lin , Stefano Ambrogio , Cijo Jose , Michael Ramamonjisoa , Patrick Labatut , Barbara De Salvo , Chiao Liu , Phillip B. Gibbons , Ziyun Li

Efficient Universal Perception Encoder

Running AI models on smart edge devices can unlock versatile user experiences, but presents challenges due to limited compute and the need to handle multiple tasks simultaneously. This requires a vision encoder with small size but powerful…

Computer Vision and Pattern Recognition · Computer Science 2026-04-01 Chenchen Zhu , Saksham Suri , Cijo Jose , Maxime Oquab , Marc Szafraniec , Wei Wen , Yunyang Xiong , Patrick Labatut , Piotr Bojanowski , Raghuraman Krishnamoorthi , Vikas Chandra

CHMv2: Improvements in Global Canopy Height Mapping using DINOv3

Accurate canopy height information is essential for quantifying forest carbon, monitoring restoration and degradation, and assessing habitat structure, yet high-fidelity measurements from airborne laser scanning (ALS) remain unevenly…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 John Brandt , Seungeun Yi , Jamie Tolan , Xinyuan Li , Peter Potapov , Jessica Ertel , Justine Spore , Huy V. Vo , Michaël Ramamonjisoa , Patrick Labatut , Piotr Bojanowski , Camille Couprie

Disentangling the Factors of Convergence between Brains and Computer Vision Models

Many AI models trained on natural images develop representations that resemble those of the human brain. However, the factors that drive this brain-model similarity remain poorly understood. To disentangle how the model, training and data…

Artificial Intelligence · Computer Science 2025-08-26 Joséphine Raugel , Marc Szafraniec , Huy V. Vo , Camille Couprie , Patrick Labatut , Piotr Bojanowski , Valentin Wyart , Jean-Rémi King

DINOv3

Self-supervised learning holds the promise of eliminating the need for manual data annotation, enabling models to scale effortlessly to massive datasets and larger architectures. By not being tailored to specific tasks or domains, this…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Oriane Siméoni , Huy V. Vo , Maximilian Seitzer , Federico Baldassarre , Maxime Oquab , Cijo Jose , Vasil Khalidov , Marc Szafraniec , Seungeun Yi , Michaël Ramamonjisoa , Francisco Massa , Daniel Haziza , Luca Wehrstedt , Jianyuan Wang , Timothée Darcet , Théo Moutakanni , Leonel Sentana , Claire Roberts , Andrea Vedaldi , Jamie Tolan , John Brandt , Camille Couprie , Julien Mairal , Hervé Jégou , Patrick Labatut , Piotr Bojanowski

Back to the Features: DINO as a Foundation for Video World Models

We present DINO-world, a powerful generalist video world model trained to predict future frames in the latent space of DINOv2. By leveraging a pre-trained image encoder and training a future predictor on a large-scale uncurated video…

Computer Vision and Pattern Recognition · Computer Science 2025-07-28 Federico Baldassarre , Marc Szafraniec , Basile Terver , Vasil Khalidov , Francisco Massa , Yann LeCun , Patrick Labatut , Maximilian Seitzer , Piotr Bojanowski

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data…

Artificial Intelligence · Computer Science 2025-06-12 Mido Assran , Adrien Bardes , David Fan , Quentin Garrido , Russell Howes , Mojtaba , Komeili , Matthew Muckley , Ammar Rizvi , Claire Roberts , Koustuv Sinha , Artem Zholus , Sergio Arnaud , Abha Gejji , Ada Martin , Francois Robert Hogan , Daniel Dugas , Piotr Bojanowski , Vasil Khalidov , Patrick Labatut , Francisco Massa , Marc Szafraniec , Kapil Krishnakumar , Yong Li , Xiaodong Ma , Sarath Chandar , Franziska Meier , Yann LeCun , Michael Rabbat , Nicolas Ballas

Accelerating Transformer Inference and Training with 2:4 Activation Sparsity

In this paper, we demonstrate how to leverage 2:4 sparsity, a popular hardware-accelerated GPU sparsity pattern, to activations to accelerate large language model training and inference. Crucially we exploit the intrinsic sparsity found in…

Machine Learning · Computer Science 2025-03-24 Daniel Haziza , Timothy Chou , Dhruv Choudhary , Luca Wehrstedt , Francisco Massa , Jiecao Yu , Geonhwa Jeong , Supriya Rao , Patrick Labatut , Jesse Cai

DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment

Self-supervised visual foundation models produce powerful embeddings that achieve remarkable performance on a wide range of downstream tasks. However, unlike vision-language models such as CLIP, self-supervised visual features are not…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Cijo Jose , Théo Moutakanni , Dahyun Kang , Federico Baldassarre , Timothée Darcet , Hu Xu , Daniel Li , Marc Szafraniec , Michaël Ramamonjisoa , Maxime Oquab , Oriane Siméoni , Huy V. Vo , Patrick Labatut , Piotr Bojanowski

Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach

Self-supervised features are the cornerstone of modern machine learning systems. They are typically pre-trained on data collections whose construction and curation typically require extensive human effort. This manual process has some…

Machine Learning · Computer Science 2024-07-01 Huy V. Vo , Vasil Khalidov , Timothée Darcet , Théo Moutakanni , Nikita Smetanin , Marc Szafraniec , Hugo Touvron , Camille Couprie , Maxime Oquab , Armand Joulin , Hervé Jégou , Patrick Labatut , Piotr Bojanowski

DINOv2: Learning Robust Visual Features without Supervision

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any…

Computer Vision and Pattern Recognition · Computer Science 2024-02-05 Maxime Oquab , Timothée Darcet , Théo Moutakanni , Huy Vo , Marc Szafraniec , Vasil Khalidov , Pierre Fernandez , Daniel Haziza , Francisco Massa , Alaaeldin El-Nouby , Mahmoud Assran , Nicolas Ballas , Wojciech Galuba , Russell Howes , Po-Yao Huang , Shang-Wen Li , Ishan Misra , Michael Rabbat , Vasu Sharma , Gabriel Synnaeve , Hu Xu , Hervé Jegou , Julien Mairal , Patrick Labatut , Armand Joulin , Piotr Bojanowski

Code Translation with Compiler Representations

In this paper, we leverage low-level compiler intermediate representations (IR) to improve code translation. Traditional transpilers rely on syntactic information and handcrafted rules, which limits their applicability and produces…

Programming Languages · Computer Science 2023-04-25 Marc Szafraniec , Baptiste Roziere , Hugh Leather , Francois Charton , Patrick Labatut , Gabriel Synnaeve

Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction

Traditional approaches for learning 3D object categories have been predominantly trained and evaluated on synthetic datasets due to the unavailability of real 3D-annotated category-centric data. Our main goal is to facilitate advances in…

Computer Vision and Pattern Recognition · Computer Science 2021-09-02 Jeremy Reizenstein , Roman Shapovalov , Philipp Henzler , Luca Sbordone , Patrick Labatut , David Novotny

DensePose 3D: Lifting Canonical Surface Maps of Articulated Objects to the Third Dimension

We tackle the problem of monocular 3D reconstruction of articulated objects like humans and animals. We contribute DensePose 3D, a method that can learn such reconstructions in a weakly supervised fashion from 2D image annotations only.…

Computer Vision and Pattern Recognition · Computer Science 2021-09-02 Roman Shapovalov , David Novotny , Benjamin Graham , Patrick Labatut , Andrea Vedaldi

Discovering Relationships between Object Categories via Universal Canonical Maps

We tackle the problem of learning the geometry of multiple categories of deformable objects jointly. Recent work has shown that it is possible to learn a unified dense pose predictor for several categories of related objects. However,…

Computer Vision and Pattern Recognition · Computer Science 2021-06-21 Natalia Neverova , Artsiom Sanakoyeu , Patrick Labatut , David Novotny , Andrea Vedaldi

NeuroMorph: Unsupervised Shape Interpolation and Correspondence in One Go

We present NeuroMorph, a new neural network architecture that takes as input two 3D shapes and produces in one go, i.e. in a single feed forward pass, a smooth interpolation and point-to-point correspondences between them. The…

Computer Vision and Pattern Recognition · Computer Science 2021-06-18 Marvin Eisenberger , David Novotny , Gael Kerchenbaum , Patrick Labatut , Natalia Neverova , Daniel Cremers , Andrea Vedaldi

Unsupervised Learning of 3D Object Categories from Videos in the Wild

Our goal is to learn a deep network that, given a small number of images of an object of a given category, reconstructs it in 3D. While several recent works have obtained analogous results using synthetic data or assuming the availability…

Computer Vision and Pattern Recognition · Computer Science 2021-03-31 Philipp Henzler , Jeremy Reizenstein , Patrick Labatut , Roman Shapovalov , Tobias Ritschel , Andrea Vedaldi , David Novotny

Low Bandwidth Video-Chat Compression using Deep Generative Models

To unlock video chat for hundreds of millions of people hindered by poor connectivity or unaffordable data costs, we propose to authentically reconstruct faces on the receiver's device using facial landmarks extracted at the sender's side…

Computer Vision and Pattern Recognition · Computer Science 2020-12-02 Maxime Oquab , Pierre Stock , Oran Gafni , Daniel Haziza , Tao Xu , Peizhao Zhang , Onur Celebi , Yana Hasson , Patrick Labatut , Bobo Bose-Kolanu , Thibault Peyronel , Camille Couprie