English

Voxtral Realtime

Artificial Intelligence 2026-04-07 v3

Abstract

We introduce Voxtral Realtime, a natively streaming automatic speech recognition model that matches offline transcription quality at sub-second latency. Unlike approaches that adapt offline models through chunking or sliding windows, Voxtral Realtime is trained end-to-end for streaming, with explicit alignment between audio and text streams. Our architecture builds on the Delayed Streams Modeling framework, introducing a new causal audio encoder and Ada RMS-Norm for improved delay conditioning. We scale pretraining to a large-scale dataset spanning 13 languages. At a delay of 480ms, Voxtral Realtime achieves performance on par with Whisper, the most widely deployed offline transcription system. We release the model weights under the Apache 2.0 license.

Keywords

Cite

@article{arxiv.2602.11298,
  title  = {Voxtral Realtime},
  author = {Mistral-AI and : and Alexander H. Liu and Andy Ehrenberg and Andy Lo and Chen-Yo Sun and Guillaume Lample and Jean-Malo Delignon and Khyathi Raghavi Chandu and Patrick von Platen and Pavankumar Reddy Muddireddy and Rohin Arora and Sanchit Gandhi and Sandeep Subramanian and Soham Ghosh and Srijan Mishra and Abhinav Rastogi and Adrien Sadé and Alan Jeffares and Albert Jiang and Alexandre Cahill and Alexandre Gavaudan and Alexandre Sablayrolles and Amélie Héliou and Amos You and Andrew Bai and Angele Lenglemetz and Anmol Agarwal and Anton Eliseev and Antonia Calvi and Arjun Majumdar and Avi Sooriyarachchi and Baptiste Bout and Baptiste Rozière and Baudouin De Monicault and Benjamin Tibi and Charlotte Cronjäger and Clémence Lanfranchi and Connor Chen and Corentin Barreau and Corentin Sautier and Cyprien Courtot and Darius Dabert and Diego de las Casas and Elizaveta Demyanenko and Elliot Chane-Sane and Enguerrand Paquin and Etienne Goffinet and Fabien Niel and Faruk Ahmed and Federico Baldassarre and Gabrielle Berrada and Gaëtan Ecrepont and Gauthier Guinet and Genevieve Hayes and Georgii Novikov and Giada Pistilli and Guillaume Kunsch and Guillaume Martin and Guillaume Raille and Gunjan Dhanuka and Gunshi Gupta and Han Zhou and Harshil Shah and Hope McGovern and Hugo Thimonier and Indraneel Mukherjee and Irene Zhang and Jaeyoung Kim and Jan Ludziejewski and Jason Rute and Joachim Studnia and John Harvill and Jonas Amar and Joséphine Delas and Josselin Somerville Roberts and Julien Tauran and Karmesh Yadav and Kartik Khandelwal and Kilian Tep and Kush Jain and Laurence Aitchison and Laurent Fainsin and Léonard Blier and Lingxiao Zhao and Louis Martin and Lucile Saulnier and Luyu Gao and Maarten Buyl and Manan Sharma and Margaret Jennings and Marie Pellat and Mark Prins and Martin Alexandre and Mathieu Poirée and Mathilde Guillaumin and Matthieu Dinot and Matthieu Futeral and Maxime Darrin and Maximilian Augustin and Mert Unsal and Mia Chiquier and Minh-Quang Pham and Nathan Grinsztajn and Neha Gupta and Olivier Bousquet and Olivier Duchenne and Patricia Wang and Paul Jacob and Paul Wambergue and Paula Kurylowicz and Philippe Pinel and Philomène Chagniot and Pierre Stock and Piotr Miłoś and Prateek Gupta and Pravesh Agrawal and Quentin Torroba and Ram Ramrakhya and Rishi Shah and Romain Sauvestre and Roman Soletskyi and Rosalie Millner and Rupert Menneer and Sagar Vaze and Samuel Barry and Samuel Humeau and Sean Cha and Shashwat Verma and Siddhant Waghjale and Siddharth Gandhi and Simon Lepage and Sumukh Aithal and Szymon Antoniak and Teven Le Scao and Théo Cachet and Theo Simon Sorg and Thibaut Lavril and Thomas Chabal and Thomas Foubert and Thomas Robert and Thomas Wang and Tim Lawson and Tom Bewley and Tom Edwards and Tyler Wang and Umar Jamil and Umberto Tomasini and Valeriia Nemychnikova and Van Phung and Vedant Nanda and Victor Jouault and Vincent Maladière and Virgile Richard and Vladislav Bataev and Wassim Bouaziz and Wen-Ding Li and William Havard and William Marshall and Xinghui Li and Xingran Guo and Xinyu Yang and Yannic Neuhaus and Yassine El Ouahidi and Yassir Bendou and Yihan Wang and Yimu Pan and Zaccharie Ramzi and Zhenlin Xu},
  journal= {arXiv preprint arXiv:2602.11298},
  year   = {2026}
}