Devendra Singh Chaplot

Voxtral

We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while…

Sound · Computer Science 2025-07-18 Alexander H. Liu , Andy Ehrenberg , Andy Lo , Clément Denoix , Corentin Barreau , Guillaume Lample , Jean-Malo Delignon , Khyathi Raghavi Chandu , Patrick von Platen , Pavankumar Reddy Muddireddy , Sanchit Gandhi , Soham Ghosh , Srijan Mishra , Thomas Foubert , Abhinav Rastogi , Adam Yang , Albert Q. Jiang , Alexandre Sablayrolles , Amélie Héliou , Amélie Martin , Anmol Agarwal , Antoine Roux , Arthur Darcet , Arthur Mensch , Baptiste Bout , Baptiste Rozière , Baudouin De Monicault , Chris Bamford , Christian Wallenwein , Christophe Renaudin , Clémence Lanfranchi , Darius Dabert , Devendra Singh Chaplot , Devon Mizelle , Diego de las Casas , Elliot Chane-Sane , Emilien Fugier , Emma Bou Hanna , Gabrielle Berrada , Gauthier Delerce , Gauthier Guinet , Georgii Novikov , Guillaume Martin , Himanshu Jaju , Jan Ludziejewski , Jason Rute , Jean-Hadrien Chabran , Jessica Chudnovsky , Joachim Studnia , Joep Barmentlo , Jonas Amar , Josselin Somerville Roberts , Julien Denize , Karan Saxena , Karmesh Yadav , Kartik Khandelwal , Kush Jain , Lélio Renard Lavaud , Léonard Blier , Lingxiao Zhao , Louis Martin , Lucile Saulnier , Luyu Gao , Marie Pellat , Mathilde Guillaumin , Mathis Felardos , Matthieu Dinot , Maxime Darrin , Maximilian Augustin , Mickaël Seznec , Neha Gupta , Nikhil Raghuraman , Olivier Duchenne , Patricia Wang , Patryk Saffer , Paul Jacob , Paul Wambergue , Paula Kurylowicz , Philomène Chagniot , Pierre Stock , Pravesh Agrawal , Rémi Delacourt , Romain Sauvestre , Roman Soletskyi , Sagar Vaze , Sandeep Subramanian , Saurabh Garg , Shashwat Dalal , Siddharth Gandhi , Sumukh Aithal , Szymon Antoniak , Teven Le Scao , Thibault Schueller , Thibaut Lavril , Thomas Robert , Thomas Wang , Timothée Lacroix , Tom Bewley , Valeriia Nemychnikova , Victor Paltz , Virgile Richard , Wen-Ding Li , William Marshall , Xuanyu Zhang , Yihan Wan , Yunhao Tang

Situated Instruction Following

Language is never spoken in a vacuum. It is expressed, comprehended, and contextualized within the holistic backdrop of the speaker's history, actions, and environment. Since humans are used to communicating efficiently with situated…

Human-Computer Interaction · Computer Science 2024-07-18 So Yeon Min , Xavi Puig , Devendra Singh Chaplot , Tsung-Yen Yang , Akshara Rai , Priyam Parashar , Ruslan Salakhutdinov , Yonatan Bisk , Roozbeh Mottaghi

Towards Open-World Mobile Manipulation in Homes: Lessons from the Neurips 2023 HomeRobot Open Vocabulary Mobile Manipulation Challenge

In order to develop robots that can effectively serve as versatile and capable home assistants, it is crucial for them to reliably perceive and interact with a wide variety of objects across diverse environments. To this end, we proposed…

Robotics · Computer Science 2024-07-10 Sriram Yenamandra , Arun Ramachandran , Mukul Khanna , Karmesh Yadav , Jay Vakil , Andrew Melnik , Michael Büttner , Leon Harz , Lyon Brown , Gora Chand Nandi , Arjun PS , Gaurav Kumar Yadav , Rahul Kala , Robert Haschke , Yang Luo , Jinxin Zhu , Yansen Han , Bingyi Lu , Xuan Gu , Qinyuan Liu , Yaping Zhao , Qiting Ye , Chenxiao Dou , Yansong Chua , Volodymyr Kuzma , Vladyslav Humennyy , Ruslan Partsey , Jonathan Francis , Devendra Singh Chaplot , Gunjan Chhablani , Alexander Clegg , Theophile Gervet , Vidhi Jain , Ram Ramrakhya , Andrew Szot , Austin Wang , Tsung-Yen Yang , Aaron Edsinger , Charlie Kemp , Binit Shah , Zsolt Kira , Dhruv Batra , Roozbeh Mottaghi , Yonatan Bisk , Chris Paxton

GOAT-Bench: A Benchmark for Multi-Modal Lifelong Navigation

The Embodied AI community has made significant strides in visual navigation tasks, exploring targets from 3D coordinates, objects, language descriptions, and images. However, these navigation models often handle only a single input modality…

Artificial Intelligence · Computer Science 2024-04-11 Mukul Khanna , Ram Ramrakhya , Gunjan Chhablani , Sriram Yenamandra , Theophile Gervet , Matthew Chang , Zsolt Kira , Devendra Singh Chaplot , Dhruv Batra , Roozbeh Mottaghi

HomeRobot: Open-Vocabulary Mobile Manipulation

HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile Manipulation (OVMM) is the problem of picking any object in any unseen…

Robotics · Computer Science 2024-01-11 Sriram Yenamandra , Arun Ramachandran , Karmesh Yadav , Austin Wang , Mukul Khanna , Theophile Gervet , Tsung-Yen Yang , Vidhi Jain , Alexander William Clegg , John Turner , Zsolt Kira , Manolis Savva , Angel Chang , Devendra Singh Chaplot , Dhruv Batra , Roozbeh Mottaghi , Yonatan Bisk , Chris Paxton

Mixtral of Experts

We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts). For every token, at each…

Machine Learning · Computer Science 2024-01-09 Albert Q. Jiang , Alexandre Sablayrolles , Antoine Roux , Arthur Mensch , Blanche Savary , Chris Bamford , Devendra Singh Chaplot , Diego de las Casas , Emma Bou Hanna , Florian Bressand , Gianna Lengyel , Guillaume Bour , Guillaume Lample , Lélio Renard Lavaud , Lucile Saulnier , Marie-Anne Lachaux , Pierre Stock , Sandeep Subramanian , Sophia Yang , Szymon Antoniak , Teven Le Scao , Théophile Gervet , Thibaut Lavril , Thomas Wang , Timothée Lacroix , William El Sayed

AutoNeRF: Training Implicit Scene Representations with Autonomous Agents

Implicit representations such as Neural Radiance Fields (NeRF) have been shown to be very effective at novel view synthesis. However, these models typically require manual and careful human data collection for training. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2023-12-25 Pierre Marza , Laetitia Matignon , Olivier Simonin , Dhruv Batra , Christian Wolf , Devendra Singh Chaplot

GOAT: GO to Any Thing

In deployment scenarios such as homes and warehouses, mobile robots are expected to autonomously navigate for extended periods, seamlessly executing tasks articulated in terms that are intuitively understandable by human operators. We…

Robotics · Computer Science 2023-11-14 Matthew Chang , Theophile Gervet , Mukul Khanna , Sriram Yenamandra , Dhruv Shah , So Yeon Min , Kavit Shah , Chris Paxton , Saurabh Gupta , Dhruv Batra , Roozbeh Mottaghi , Jitendra Malik , Devendra Singh Chaplot

Habitat 3.0: A Co-Habitat for Humans, Avatars and Robots

We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in home environments. Habitat 3.0 offers contributions across three dimensions: (1) Accurate humanoid simulation: addressing challenges in modeling…

Human-Computer Interaction · Computer Science 2023-10-24 Xavier Puig , Eric Undersander , Andrew Szot , Mikael Dallaire Cote , Tsung-Yen Yang , Ruslan Partsey , Ruta Desai , Alexander William Clegg , Michal Hlavac , So Yeon Min , Vladimír Vondruš , Theophile Gervet , Vincent-Pierre Berges , John M. Turner , Oleksandr Maksymets , Zsolt Kira , Mrinal Kalakrishnan , Jitendra Malik , Devendra Singh Chaplot , Unnat Jain , Dhruv Batra , Akshara Rai , Roozbeh Mottaghi

Habitat-Matterport 3D Semantics Dataset

We present the Habitat-Matterport 3D Semantics (HM3DSEM) dataset. HM3DSEM is the largest dataset of 3D real-world spaces with densely annotated semantics that is currently available to the academic community. It consists of 142,646 object…

Computer Vision and Pattern Recognition · Computer Science 2023-10-16 Karmesh Yadav , Ram Ramrakhya , Santhosh Kumar Ramakrishnan , Theo Gervet , John Turner , Aaron Gokaslan , Noah Maestre , Angel Xuan Chang , Dhruv Batra , Manolis Savva , Alexander William Clegg , Devendra Singh Chaplot

Mistral 7B

We introduce Mistral 7B v0.1, a 7-billion-parameter language model engineered for superior performance and efficiency. Mistral 7B outperforms Llama 2 13B across all evaluated benchmarks, and Llama 1 34B in reasoning, mathematics, and code…

Computation and Language · Computer Science 2023-10-11 Albert Q. Jiang , Alexandre Sablayrolles , Arthur Mensch , Chris Bamford , Devendra Singh Chaplot , Diego de las Casas , Florian Bressand , Gianna Lengyel , Guillaume Lample , Lucile Saulnier , Lélio Renard Lavaud , Marie-Anne Lachaux , Pierre Stock , Teven Le Scao , Thibaut Lavril , Thomas Wang , Timothée Lacroix , William El Sayed

Galactic: Scaling End-to-End Reinforcement Learning for Rearrangement at 100k Steps-Per-Second

We present Galactic, a large-scale simulation and reinforcement-learning (RL) framework for robotic mobile manipulation in indoor environments. Specifically, a Fetch robot (equipped with a mobile base, 7DoF arm, RGBD camera, egomotion, and…

Machine Learning · Computer Science 2023-06-14 Vincent-Pierre Berges , Andrew Szot , Devendra Singh Chaplot , Aaron Gokaslan , Roozbeh Mottaghi , Dhruv Batra , Eric Undersander

Navigating to Objects Specified by Images

Images are a convenient way to specify which particular object instance an embodied agent should navigate to. Solving this task requires semantic visual reasoning and exploration of unknown environments. We present a system that can perform…

Computer Vision and Pattern Recognition · Computer Science 2023-04-04 Jacob Krantz , Theophile Gervet , Karmesh Yadav , Austin Wang , Chris Paxton , Roozbeh Mottaghi , Dhruv Batra , Jitendra Malik , Stefan Lee , Devendra Singh Chaplot

Retrospectives on the Embodied AI Workshop

We present a retrospective on the state of Embodied AI research. Our analysis focuses on 13 challenges presented at the Embodied AI Workshop at CVPR. These challenges are grouped into three themes: (1) visual navigation, (2) rearrangement,…

Computer Vision and Pattern Recognition · Computer Science 2022-12-06 Matt Deitke , Dhruv Batra , Yonatan Bisk , Tommaso Campari , Angel X. Chang , Devendra Singh Chaplot , Changan Chen , Claudia Pérez D'Arpino , Kiana Ehsani , Ali Farhadi , Li Fei-Fei , Anthony Francis , Chuang Gan , Kristen Grauman , David Hall , Winson Han , Unnat Jain , Aniruddha Kembhavi , Jacob Krantz , Stefan Lee , Chengshu Li , Sagnik Majumder , Oleksandr Maksymets , Roberto Martín-Martín , Roozbeh Mottaghi , Sonia Raychaudhuri , Mike Roberts , Silvio Savarese , Manolis Savva , Mohit Shridhar , Niko Sünderhauf , Andrew Szot , Ben Talbot , Joshua B. Tenenbaum , Jesse Thomason , Alexander Toshev , Joanne Truong , Luca Weihs , Jiajun Wu

Navigating to Objects in the Real World

Semantic navigation is necessary to deploy mobile robots in uncontrolled environments like our homes, schools, and hospitals. Many learning-based approaches have been proposed in response to the lack of semantic understanding of the…

Robotics · Computer Science 2022-12-05 Theophile Gervet , Soumith Chintala , Dhruv Batra , Jitendra Malik , Devendra Singh Chaplot

Instance-Specific Image Goal Navigation: Training Embodied Agents to Find Object Instances

We consider the problem of embodied visual navigation given an image-goal (ImageNav) where an agent is initialized in an unfamiliar environment and tasked with navigating to a location 'described' by an image. Unlike related navigation…

Computer Vision and Pattern Recognition · Computer Science 2022-11-30 Jacob Krantz , Stefan Lee , Jitendra Malik , Dhruv Batra , Devendra Singh Chaplot

Multi-skill Mobile Manipulation for Object Rearrangement

We study a modular approach to tackle long-horizon mobile manipulation tasks for object rearrangement, which decomposes a full task into a sequence of subtasks. To tackle the entire task, prior work chains multiple stationary manipulation…

Robotics · Computer Science 2022-09-08 Jiayuan Gu , Devendra Singh Chaplot , Hao Su , Jitendra Malik

PONI: Potential Functions for ObjectGoal Navigation with Interaction-free Learning

State-of-the-art approaches to ObjectGoal navigation rely on reinforcement learning and typically require significant computational resources and time for learning. We propose Potential functions for ObjectGoal Navigation with…

Computer Vision and Pattern Recognition · Computer Science 2022-06-20 Santhosh Kumar Ramakrishnan , Devendra Singh Chaplot , Ziad Al-Halah , Jitendra Malik , Kristen Grauman

FILM: Following Instructions in Language with Modular Methods

Recent methods for embodied instruction following are typically trained end-to-end using imitation learning. This often requires the use of expert trajectories and low-level language instructions. Such approaches assume that neural states…

Computation and Language · Computer Science 2022-03-18 So Yeon Min , Devendra Singh Chaplot , Pradeep Ravikumar , Yonatan Bisk , Ruslan Salakhutdinov

Recognizing Scenes from Novel Viewpoints

Humans can perceive scenes in 3D from a handful of 2D views. For AI agents, the ability to recognize a scene from any viewpoint given only a few images enables them to efficiently interact with the scene and its objects. In this work, we…

Computer Vision and Pattern Recognition · Computer Science 2021-12-03 Shengyi Qian , Alexander Kirillov , Nikhila Ravi , Devendra Singh Chaplot , Justin Johnson , David F. Fouhey , Georgia Gkioxari