English
Related papers

Related papers: INTELLECT-3: Technical Report

200 papers

We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning model using fully…

In this report, we introduce INTELLECT-1, the first 10 billion parameter language model collaboratively trained across the globe, demonstrating that large-scale model training is no longer confined to large corporations but can be achieved…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-03 Sami Jaghouar , Jack Min Ong , Manveer Basra , Fares Obeid , Jannik Straube , Michael Keiblinger , Elie Bakouch , Lucas Atkins , Maziyar Panahi , Charles Goddard , Max Ryabinin , Johannes Hagemann

Current Large Language Models (LLMs) exhibit significant limitations, notably in structured, interpretable, and verifiable medical reasoning, alongside practical deployment challenges related to computational resources and data privacy.…

Computation and Language · Computer Science 2025-06-06 Boqin Zhuang , Chenxiao Song , Huitong Lu , Jiacheng Qiao , Mingqian Liu , Mingxing Yu , Ping Hong , Rui Li , Xiaoxia Song , Xiangjun Xu , Xu Chen , Yaoyao Ma , Yujie Gao

Recent advances in reinforcement learning (RL) have substantially improved the training of large-scale language models, leading to significant gains in generation quality and reasoning ability. However, most existing research focuses on…

Machine Learning · Computer Science 2026-01-13 Di Zhang , Xun Wu , Shaohan Huang , Lingjie Jiang , Yaru Hao , Li Dong , Zewen Chi , Zhifang Sui , Furu Wei

Scaling large language models (LLMs) significantly improves performance but comes with prohibitive computational costs. Mixture-of-Experts (MoE) models offer an efficient alternative, increasing capacity without a proportional rise in…

Machine Learning · Computer Science 2024-12-16 Aditya Vavre , Ethan He , Dennis Liu , Zijie Yan , June Yang , Nima Tajbakhsh , Ashwath Aithal

Large language models (LLMs) have recently achieved significant advances in reasoning and demonstrated their advantages in solving challenging problems. Yet, their effectiveness in the semiconductor display industry remains limited due to a…

Reinforcement learning (RL) has emerged as a critical paradigm for post-training Vision-Language-Action (VLA) models, enabling embodied agents to adapt and improve through environmental interaction. However, existing RL frameworks for VLAs…

RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operator-level Chakra execution traces from an…

We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a…

Enhancing the reasoning capabilities of large language models (LLMs) typically relies on massive computational resources and extensive datasets, limiting accessibility for resource-constrained settings. Our study investigates the potential…

Machine Learning · Computer Science 2026-01-21 Quy-Anh Dang , Chris Ngo

Recent advancements in large language models (LLMs) have demonstrated impressive chain-of-thought reasoning capabilities, with reinforcement learning (RL) playing a crucial role in this progress. While "aha moment" patterns--where models…

Computation and Language · Computer Science 2025-07-24 Lai Wei , Yuting Li , Kaipeng Zheng , Chen Wang , Yue Wang , Linghe Kong , Lichao Sun , Weiran Huang

Reinforcement learning (RL) has emerged as a promising approach to improve large language model (LLM) reasoning, yet most open efforts focus narrowly on math and code, limiting our understanding of its broader applicability to general…

Reinforcement learning (RL) has emerged as the de-facto paradigm for improving the reasoning capabilities of large language models (LLMs). We have developed RLAX, a scalable RL framework on TPUs. RLAX employs a parameter-server…

Reinforcement learning (RL) with large language models shows promise in complex reasoning. However, its progress is hindered by the lack of large-scale training data that is sufficiently challenging, contamination-free and verifiable. To…

We introduce Magistral, Mistral's first reasoning model and our own scalable reinforcement learning (RL) pipeline. Instead of relying on existing implementations and RL traces distilled from prior models, we follow a ground up approach,…

Computation and Language · Computer Science 2025-06-13 Mistral-AI , : , Abhinav Rastogi , Albert Q. Jiang , Andy Lo , Gabrielle Berrada , Guillaume Lample , Jason Rute , Joep Barmentlo , Karmesh Yadav , Kartik Khandelwal , Khyathi Raghavi Chandu , Léonard Blier , Lucile Saulnier , Matthieu Dinot , Maxime Darrin , Neha Gupta , Roman Soletskyi , Sagar Vaze , Teven Le Scao , Yihan Wang , Adam Yang , Alexander H. Liu , Alexandre Sablayrolles , Amélie Héliou , Amélie Martin , Andy Ehrenberg , Anmol Agarwal , Antoine Roux , Arthur Darcet , Arthur Mensch , Baptiste Bout , Baptiste Rozière , Baudouin De Monicault , Chris Bamford , Christian Wallenwein , Christophe Renaudin , Clémence Lanfranchi , Darius Dabert , Devon Mizelle , Diego de las Casas , Elliot Chane-Sane , Emilien Fugier , Emma Bou Hanna , Gauthier Delerce , Gauthier Guinet , Georgii Novikov , Guillaume Martin , Himanshu Jaju , Jan Ludziejewski , Jean-Hadrien Chabran , Jean-Malo Delignon , Joachim Studnia , Jonas Amar , Josselin Somerville Roberts , Julien Denize , Karan Saxena , Kush Jain , Lingxiao Zhao , Louis Martin , Luyu Gao , Lélio Renard Lavaud , Marie Pellat , Mathilde Guillaumin , Mathis Felardos , Maximilian Augustin , Mickaël Seznec , Nikhil Raghuraman , Olivier Duchenne , Patricia Wang , Patrick von Platen , Patryk Saffer , Paul Jacob , Paul Wambergue , Paula Kurylowicz , Pavankumar Reddy Muddireddy , Philomène Chagniot , Pierre Stock , Pravesh Agrawal , Romain Sauvestre , Rémi Delacourt , Sanchit Gandhi , Sandeep Subramanian , Shashwat Dalal , Siddharth Gandhi , Soham Ghosh , Srijan Mishra , Sumukh Aithal , Szymon Antoniak , Thibault Schueller , Thibaut Lavril , Thomas Robert , Thomas Wang , Timothée Lacroix , Valeriia Nemychnikova , Victor Paltz , Virgile Richard , Wen-Ding Li , William Marshall , Xuanyu Zhang , Yunhao Tang

In this paper, we investigate code-integrated reasoning, where models generate code when necessary and integrate feedback by executing it through a code interpreter. To acquire this capability, models must learn when and how to use external…

Computation and Language · Computer Science 2025-06-02 Fei Bai , Yingqian Min , Beichen Zhang , Zhipeng Chen , Wayne Xin Zhao , Lei Fang , Zheng Liu , Zhongyuan Wang , Ji-Rong Wen

The deployment of intelligent reinforcement learning (RL) agents on resource-constrained edge devices remains a fundamental challenge due to the substantial memory, computational, and energy requirements of modern deep learning systems.…

Mathematical reasoning is a cornerstone of artificial general intelligence and a primary benchmark for evaluating the capabilities of Large Language Models (LLMs). While state-of-the-art models show promise, they often falter when faced…

Computation and Language · Computer Science 2025-07-29 Yifan Hao , Fangning Chao , Yaqian Hao , Zhaojun Cui , Huan Bai , Haiyu Zhang , Yankai Liu , Chao Deng , Junlan Feng

While Large Language Models (LLMs) have demonstrated significant potential in Tool-Integrated Reasoning (TIR), existing training paradigms face significant limitations: Zero-RL suffers from inefficient exploration and mode degradation due…

Artificial Intelligence · Computer Science 2026-04-13 Weiyang Guo , Zesheng Shi , Liye Zhao , Jiayuan Ma , Zeen Zhu , Junxian He , Min Zhang , Jing Li

We introduce Step 3.5 Flash, a sparse Mixture-of-Experts (MoE) model that bridges frontier-level agentic intelligence and computational efficiency. We focus on what matters most when building agents: sharp reasoning and fast, reliable…

Computation and Language · Computer Science 2026-02-24 Ailin Huang , Ang Li , Aobo Kong , Bin Wang , Binxing Jiao , Bo Dong , Bojun Wang , Boyu Chen , Brian Li , Buyun Ma , Chang Su , Changxin Miao , Changyi Wan , Chao Lou , Chen Hu , Chen Xu , Chenfeng Yu , Chengting Feng , Chengyuan Yao , Chunrui Han , Dan Ma , Dapeng Shi , Daxin Jiang , Dehua Ma , Deshan Sun , Di Qi , Enle Liu , Fajie Zhang , Fanqi Wan , Guanzhe Huang , Gulin Yan , Guoliang Cao , Guopeng Li , Han Cheng , Hangyu Guo , Hanshan Zhang , Hao Nie , Haonan Jia , Haoran Lv , Hebin Zhou , Hekun Lv , Heng Wang , Heung-Yeung Shum , Hongbo Huang , Hongbo Peng , Hongyu Zhou , Hongyuan Wang , Houyong Chen , Huangxi Zhu , Huimin Wu , Huiyong Guo , Jia Wang , Jian Zhou , Jianjian Sun , Jiaoren Wu , Jiaran Zhang , Jiashu Lv , Jiashuo Liu , Jiayi Fu , Jiayu Liu , Jie Cheng , Jie Luo , Jie Yang , Jie Zhou , Jieyi Hou , Jing Bai , Jingcheng Hu , Jingjing Xie , Jingwei Wu , Jingyang Zhang , Jishi Zhou , Junfeng Liu , Junzhe Lin , Ka Man Lo , Kai Liang , Kaibo Liu , Kaijun Tan , Kaiwen Yan , Kaixiang Li , Kang An , Kangheng Lin , Lei Yang , Liang Lv , Liang Zhao , Liangyu Chen , Lieyu Shi , Liguo Tan , Lin Lin , Lina Chen , Luck Ma , Mengqiang Ren , Michael Li , Ming Li , Mingliang Li , Mingming Zhang , Mingrui Chen , Mitt Huang , Na Wang , Peng Liu , Qi Han , Qian Zhao , Qinglin He , Qinxin Du , Qiuping Wu , Quan Sun , Rongqiu Yang , Ruihang Miao , Ruixin Han , Ruosi Wan , Ruyan Guo , Shan Wang , Shaoliang Pang , Shaowen Yang , Shengjie Fan , Shijie Shang , Shiliang Yang , Shiwei Li , Shuangshuang Tian , Siqi Liu , Siye Wu , Siyu Chen , Song Yuan , Tiancheng Cao , Tianchi Yue , Tianhao Cheng , Tianning Li , Tingdan Luo , Wang You , Wei Ji , Wei Yuan , Wei Zhang , Weibo Wu , Weihao Xie , Wen Sun , Wenjin Deng , Wenzhen Zheng , Wuxun Xie , Xiangfeng Wang , Xiangwen Kong , Xiangyu Liu , Xiangyu Zhang , Xiaobo Yang , Xiaojia Liu , Xiaolan Yuan , Xiaoran Jiao , Xiaoxiao Ren , Xiaoyun Zhang , Xin Li , Xin Liu , Xin Wu , Xing Chen , Xingping Yang , Xinran Wang , Xu Zhao , Xuan He , Xuanti Feng , Xuedan Cai , Xuqiang Zhou , Yanbo Yu , Yang Li , Yang Xu , Yanlin Lai , Yanming Xu , Yaoyu Wang , Yeqing Shen , Yibo Zhu , Yichen Lv , Yicheng Cao , Yifeng Gong , Yijing Yang , Yikun Yang , Yin Zhao , Yingxiu Zhao , Yinmin Zhang , Yitong Zhang , Yixuan Zhang , Yiyang Chen , Yongchi Zhao , Yongshen Long , Yongyao Wang , Yousong Guan , Yu Zhou , Yuang Peng , Yuanhao Ding , Yuantao Fan , Yuanwei Lu , Yuanzhen Yang , Yuchu Luo , Yudi Zhao , Yue Peng , Yueqiang Lin , Yufan Lu , Yuling Zhao , Yunzhou Ju , Yurong Zhang , Yusheng Li , Yuxiang Yang , Yuyang Chen , Yuzhu Cai , Zejia Weng , Zetao Hong , Zexi Li , Zhe Xie , Zheng Ge , Zheng Gong , Zheng Zeng , Zhenyi Lu , Zhewei Huang , Zhichao Chang , Zhiguo Huang , Zhiheng Hu , Zidong Yang , Zili Wang , Ziqi Ren , Zixin Zhang , Zixuan Wang
‹ Prev 1 2 3 10 Next ›