Voxtral

Alexander H. Liu; Andy Ehrenberg; Andy Lo; Clément Denoix; Corentin Barreau; Guillaume Lample; Jean-Malo Delignon; Khyathi Raghavi Chandu; Patrick von Platen; Pavankumar Reddy Muddireddy; Sanchit Gandhi; Soham Ghosh; Srijan Mishra; Thomas Foubert; Abhinav Rastogi; Adam Yang; Albert Q. Jiang; Alexandre Sablayrolles; Amélie Héliou; Amélie Martin; Anmol Agarwal; Antoine Roux; Arthur Darcet; Arthur Mensch; Baptiste Bout; Baptiste Rozière; Baudouin De Monicault; Chris Bamford; Christian Wallenwein; Christophe Renaudin; Clémence Lanfranchi; Darius Dabert; Devendra Singh Chaplot; Devon Mizelle; Diego de las Casas; Elliot Chane-Sane; Emilien Fugier; Emma Bou Hanna; Gabrielle Berrada; Gauthier Delerce; Gauthier Guinet; Georgii Novikov; Guillaume Martin; Himanshu Jaju; Jan Ludziejewski; Jason Rute; Jean-Hadrien Chabran; Jessica Chudnovsky; Joachim Studnia; Joep Barmentlo

Voxtral

Sound 2025-07-18 v1 Artificial Intelligence Audio and Speech Processing

Authors: Alexander H. Liu , Andy Ehrenberg , Andy Lo , Clément Denoix , Corentin Barreau , Guillaume Lample , Jean-Malo Delignon , Khyathi Raghavi Chandu , Patrick von Platen , Pavankumar Reddy Muddireddy , Sanchit Gandhi , Soham Ghosh , Srijan Mishra , Thomas Foubert , Abhinav Rastogi , Adam Yang , Albert Q. Jiang , Alexandre Sablayrolles , Amélie Héliou , Amélie Martin , Anmol Agarwal , Antoine Roux , Arthur Darcet , Arthur Mensch , Baptiste Bout , Baptiste Rozière , Baudouin De Monicault , Chris Bamford , Christian Wallenwein , Christophe Renaudin , Clémence Lanfranchi , Darius Dabert , Devendra Singh Chaplot , Devon Mizelle , Diego de las Casas , Elliot Chane-Sane , Emilien Fugier , Emma Bou Hanna , Gabrielle Berrada , Gauthier Delerce , Gauthier Guinet , Georgii Novikov , Guillaume Martin , Himanshu Jaju , Jan Ludziejewski , Jason Rute , Jean-Hadrien Chabran , Jessica Chudnovsky , Joachim Studnia , Joep Barmentlo , Jonas Amar , Josselin Somerville Roberts , Julien Denize , Karan Saxena , Karmesh Yadav , Kartik Khandelwal , Kush Jain , Lélio Renard Lavaud , Léonard Blier , Lingxiao Zhao , Louis Martin , Lucile Saulnier , Luyu Gao , Marie Pellat , Mathilde Guillaumin , Mathis Felardos , Matthieu Dinot , Maxime Darrin , Maximilian Augustin , Mickaël Seznec , Neha Gupta , Nikhil Raghuraman , Olivier Duchenne , Patricia Wang , Patryk Saffer , Paul Jacob , Paul Wambergue , Paula Kurylowicz , Philomène Chagniot , Pierre Stock , Pravesh Agrawal , Rémi Delacourt , Romain Sauvestre , Roman Soletskyi , Sagar Vaze , Sandeep Subramanian , Saurabh Garg , Shashwat Dalal , Siddharth Gandhi , Sumukh Aithal , Szymon Antoniak , Teven Le Scao , Thibault Schueller , Thibaut Lavril , Thomas Robert , Thomas Wang , Timothée Lacroix , Tom Bewley , Valeriia Nemychnikova , Victor Paltz , Virgile Richard , Wen-Ding Li , William Marshall , Xuanyu Zhang , Yihan Wan , Yunhao Tang

View on arXiv ↗ PDF ↗

Abstract

We present Voxtral Mini and Voxtral Small, two multimodal audio chat models. Voxtral is trained to comprehend both spoken audio and text documents, achieving state-of-the-art performance across a diverse range of audio benchmarks, while preserving strong text capabilities. Voxtral Small outperforms a number of closed-source models, while being small enough to run locally. A 32K context window enables the model to handle audio files up to 40 minutes in duration and long multi-turn conversations. We also contribute three benchmarks for evaluating speech understanding models on knowledge and trivia. Both Voxtral models are released under Apache 2.0 license.

Keywords

speech recognition

Cite

@article{arxiv.2507.13264,
  title  = {Voxtral},
  author = {Alexander H. Liu and Andy Ehrenberg and Andy Lo and Clément Denoix and Corentin Barreau and Guillaume Lample and Jean-Malo Delignon and Khyathi Raghavi Chandu and Patrick von Platen and Pavankumar Reddy Muddireddy and Sanchit Gandhi and Soham Ghosh and Srijan Mishra and Thomas Foubert and Abhinav Rastogi and Adam Yang and Albert Q. Jiang and Alexandre Sablayrolles and Amélie Héliou and Amélie Martin and Anmol Agarwal and Antoine Roux and Arthur Darcet and Arthur Mensch and Baptiste Bout and Baptiste Rozière and Baudouin De Monicault and Chris Bamford and Christian Wallenwein and Christophe Renaudin and Clémence Lanfranchi and Darius Dabert and Devendra Singh Chaplot and Devon Mizelle and Diego de las Casas and Elliot Chane-Sane and Emilien Fugier and Emma Bou Hanna and Gabrielle Berrada and Gauthier Delerce and Gauthier Guinet and Georgii Novikov and Guillaume Martin and Himanshu Jaju and Jan Ludziejewski and Jason Rute and Jean-Hadrien Chabran and Jessica Chudnovsky and Joachim Studnia and Joep Barmentlo and Jonas Amar and Josselin Somerville Roberts and Julien Denize and Karan Saxena and Karmesh Yadav and Kartik Khandelwal and Kush Jain and Lélio Renard Lavaud and Léonard Blier and Lingxiao Zhao and Louis Martin and Lucile Saulnier and Luyu Gao and Marie Pellat and Mathilde Guillaumin and Mathis Felardos and Matthieu Dinot and Maxime Darrin and Maximilian Augustin and Mickaël Seznec and Neha Gupta and Nikhil Raghuraman and Olivier Duchenne and Patricia Wang and Patryk Saffer and Paul Jacob and Paul Wambergue and Paula Kurylowicz and Philomène Chagniot and Pierre Stock and Pravesh Agrawal and Rémi Delacourt and Romain Sauvestre and Roman Soletskyi and Sagar Vaze and Sandeep Subramanian and Saurabh Garg and Shashwat Dalal and Siddharth Gandhi and Sumukh Aithal and Szymon Antoniak and Teven Le Scao and Thibault Schueller and Thibaut Lavril and Thomas Robert and Thomas Wang and Timothée Lacroix and Tom Bewley and Valeriia Nemychnikova and Victor Paltz and Virgile Richard and Wen-Ding Li and William Marshall and Xuanyu Zhang and Yihan Wan and Yunhao Tang},
  journal= {arXiv preprint arXiv:2507.13264},
  year   = {2025}
}

Comments

17 pages

Voxtral

Abstract

Keywords

Cite

Comments

Related papers