Author

Russell Howes

results may include different authors with the same name

6 papers

Contrastive Language-Image Pre-training (CLIP) is an approach that has advanced research and applications in computer vision, fueling modern recognition systems and generative models. We believe that the main ingredient to the success of…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Hu Xu , Saining Xie , Xiaoqing Ellen Tan , Po-Yao Huang , Russell Howes , Vasu Sharma , Shang-Wen Li , Gargi Ghosh , Luke Zettlemoyer , Christoph Feichtenhofer

V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

A major challenge for modern AI is to learn to understand the world and learn to act largely by observation. This paper explores a self-supervised approach that combines internet-scale video data with a small amount of interaction data…

Artificial Intelligence · Computer Science 2025-06-12 Mido Assran , Adrien Bardes , David Fan , Quentin Garrido , Russell Howes , Mojtaba , Komeili , Matthew Muckley , Ammar Rizvi , Claire Roberts , Koustuv Sinha , Artem Zholus , Sergio Arnaud , Abha Gejji , Ada Martin , Francois Robert Hogan , Daniel Dugas , Piotr Bojanowski , Vasil Khalidov , Patrick Labatut , Francisco Massa , Marc Szafraniec , Kapil Krishnakumar , Yong Li , Xiaodong Ma , Sarath Chandar , Franziska Meier , Yann LeCun , Michael Rabbat , Nicolas Ballas

Text Quality-Based Pruning for Efficient Training of Language Models

In recent times training Language Models (LMs) have relied on computationally heavy training over massive datasets which makes this training process extremely laborious. In this paper we propose a novel method for numerically evaluating…

Computation and Language · Computer Science 2024-05-14 Vasu Sharma , Karthik Padthe , Newsha Ardalani , Kushal Tirumala , Russell Howes , Hu Xu , Po-Yao Huang , Shang-Wen Li , Armen Aghajanyan , Gargi Ghosh , Luke Zettlemoyer

DINOv2: Learning Robust Visual Features without Supervision

The recent breakthroughs in natural language processing for model pretraining on large quantities of data have opened the way for similar foundation models in computer vision. These models could greatly simplify the use of images in any…

Computer Vision and Pattern Recognition · Computer Science 2024-02-05 Maxime Oquab , Timothée Darcet , Théo Moutakanni , Huy Vo , Marc Szafraniec , Vasil Khalidov , Pierre Fernandez , Daniel Haziza , Francisco Massa , Alaaeldin El-Nouby , Mahmoud Assran , Nicolas Ballas , Wojciech Galuba , Russell Howes , Po-Yao Huang , Shang-Wen Li , Ishan Misra , Michael Rabbat , Vasu Sharma , Gabriel Synnaeve , Hu Xu , Hervé Jegou , Julien Mairal , Patrick Labatut , Armand Joulin , Piotr Bojanowski

Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning

We present CM3Leon (pronounced "Chameleon"), a retrieval-augmented, token-based, decoder-only multi-modal language model capable of generating and infilling both text and images. CM3Leon uses the CM3 multi-modal architecture but…

Machine Learning · Computer Science 2023-09-07 Lili Yu , Bowen Shi , Ramakanth Pasunuru , Benjamin Muller , Olga Golovneva , Tianlu Wang , Arun Babu , Binh Tang , Brian Karrer , Shelly Sheynin , Candace Ross , Adam Polyak , Russell Howes , Vasu Sharma , Puxin Xu , Hovhannes Tamoyan , Oron Ashual , Uriel Singer , Shang-Wen Li , Susan Zhang , Richard James , Gargi Ghosh , Yaniv Taigman , Maryam Fazel-Zarandi , Asli Celikyilmaz , Luke Zettlemoyer , Armen Aghajanyan

CiT: Curation in Training for Effective Vision-Language Data

Large vision-language models are generally applicable to many downstream tasks, but come at an exorbitant training cost that only large institutions can afford. This paper trades generality for efficiency and presents Curation in Training…

Computer Vision and Pattern Recognition · Computer Science 2023-01-06 Hu Xu , Saining Xie , Po-Yao Huang , Licheng Yu , Russell Howes , Gargi Ghosh , Luke Zettlemoyer , Christoph Feichtenhofer