English
Related papers

Related papers: INTELLECT-1 Technical Report

200 papers

We introduce INTELLECT-2, the first globally distributed reinforcement learning (RL) training run of a 32 billion parameter language model. Unlike traditional centralized training efforts, INTELLECT-2 trains a reasoning model using fully…

We present INTELLECT-3, a 106B-parameter Mixture-of-Experts model (12B active) trained with large-scale reinforcement learning on our end-to-end RL infrastructure stack. INTELLECT-3 achieves state of the art performance for its size across…

We present Ring-1T, the first open-source, state-of-the-art thinking model with a trillion-scale parameter. It features 1 trillion total parameters and activates approximately 50 billion per token. Training such models at a…

Large language models (LLMs) have demonstrated remarkable success as foundational models, benefiting various downstream applications through fine-tuning. Recent studies on loss scaling have demonstrated the superior performance of larger…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-12-25 Sajal Dash , Isaac Lyngaas , Junqi Yin , Xiao Wang , Romain Egele , Guojing Cong , Feiyi Wang , Prasanna Balaprakash

Distributed training has become a pervasive and effective approach for training a large neural network (NN) model with processing massive data. However, it is very challenging to satisfy requirements from various NN models, diverse…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-07 Yulong Ao , Zhihua Wu , Dianhai Yu , Weibao Gong , Zhiqing Kui , Minxu Zhang , Zilingfeng Ye , Liang Shen , Yanjun Ma , Tian Wu , Haifeng Wang , Wei Zeng , Chao Yang

The distributed training of foundation models, particularly large language models (LLMs), demands a high level of communication. Consequently, it is highly dependent on a centralized cluster with fast and reliable interconnects. Can we…

Machine Learning · Computer Science 2025-06-27 Ji Qi , WenPeng Zhu , Li Li , Ming Wu , YingJun Wu , Wu He , Xun Gao , Jason Zeng , Michael Heinrich

Mixture of Experts (MoE) models have emerged as a promising paradigm for scaling language models efficiently by activating only a subset of parameters for each input token. In this report, we present dots.llm1, a large-scale MoE model that…

Large-scale AI model training divides work across thousands of GPUs, then synchronizes gradients across them at each step. This incurs a significant network burden that only centralized, monolithic clusters can support, driving up…

Computer Vision and Pattern Recognition · Computer Science 2025-01-13 David McAllister , Matthew Tancik , Jiaming Song , Angjoo Kanazawa

We present Fox-1, a series of small language models (SLMs) consisting of Fox-1-1.6B and Fox-1-1.6B-Instruct-v0.1. These models are pre-trained on 3 trillion tokens of web-scraped document data and fine-tuned with 5 billion tokens of…

RAPID-LLM is a unified performance modeling framework for large language model (LLM) training and inference on GPU clusters. It couples a DeepFlow-based frontend that generates hardware-aware, operator-level Chakra execution traces from an…

Large language models (LLMs) have achieved near-human performance across diverse reasoning tasks, yet their deployment on resource-constrained Internet-of-Things (IoT) devices remains impractical due to massive parameter footprints and…

Machine Learning · Computer Science 2025-11-07 Mingyu Sung , Vikas Palakonda , Suhwan Im , Sunghwan Moon , Il-Min Kim , Sangseok Yun , Jae-Mo Kang

The explosive growth of Large Language Models (LLMs), such as GPT-4 with 1.8 trillion parameters, demands a fundamental rethinking of data center architecture to ensure scalability, efficiency, and cost-effectiveness. Our work provides a…

Hardware Architecture · Computer Science 2025-09-09 Jesmin Jahan Tithi , Hanjiang Wu , Avishaii Abuhatzera , Fabrizio Petrini

TeleChat3-MoE is the latest series of TeleChat large language models, featuring a Mixture-of-Experts (MoE) architecture with parameter counts ranging from 105 billion to over one trillion,trained end-to-end on Ascend NPU cluster. This…

We introduce MiniMax-M1, the world's first open-weight, large-scale hybrid-attention reasoning model. MiniMax-M1 is powered by a hybrid Mixture-of-Experts (MoE) architecture combined with a lightning attention mechanism. The model is…

Computation and Language · Computer Science 2025-06-17 MiniMax , : , Aili Chen , Aonian Li , Bangwei Gong , Binyang Jiang , Bo Fei , Bo Yang , Boji Shan , Changqing Yu , Chao Wang , Cheng Zhu , Chengjun Xiao , Chengyu Du , Chi Zhang , Chu Qiao , Chunhao Zhang , Chunhui Du , Congchao Guo , Da Chen , Deming Ding , Dianjun Sun , Dong Li , Enwei Jiao , Haigang Zhou , Haimo Zhang , Han Ding , Haohai Sun , Haoyu Feng , Huaiguang Cai , Haichao Zhu , Jian Sun , Jiaqi Zhuang , Jiaren Cai , Jiayuan Song , Jin Zhu , Jingyang Li , Jinhao Tian , Jinli Liu , Junhao Xu , Junjie Yan , Junteng Liu , Junxian He , Kaiyi Feng , Ke Yang , Kecheng Xiao , Le Han , Leyang Wang , Lianfei Yu , Liheng Feng , Lin Li , Lin Zheng , Linge Du , Lingyu Yang , Lunbin Zeng , Minghui Yu , Mingliang Tao , Mingyuan Chi , Mozhi Zhang , Mujie Lin , Nan Hu , Nongyu Di , Peng Gao , Pengfei Li , Pengyu Zhao , Qibing Ren , Qidi Xu , Qile Li , Qin Wang , Rong Tian , Ruitao Leng , Shaoxiang Chen , Shaoyu Chen , Shengmin Shi , Shitong Weng , Shuchang Guan , Shuqi Yu , Sichen Li , Songquan Zhu , Tengfei Li , Tianchi Cai , Tianrun Liang , Weiyu Cheng , Weize Kong , Wenkai Li , Xiancai Chen , Xiangjun Song , Xiao Luo , Xiao Su , Xiaobo Li , Xiaodong Han , Xinzhu Hou , Xuan Lu , Xun Zou , Xuyang Shen , Yan Gong , Yan Ma , Yang Wang , Yiqi Shi , Yiran Zhong , Yonghong Duan , Yongxiang Fu , Yongyi Hu , Yu Gao , Yuanxiang Fan , Yufeng Yang , Yuhao Li , Yulin Hu , Yunan Huang , Yunji Li , Yunzhi Xu , Yuxin Mao , Yuxuan Shi , Yuze Wenren , Zehan Li , Zelin Li , Zhanxu Tian , Zhengmao Zhu , Zhenhua Fan , Zhenzhen Wu , Zhichao Xu , Zhihang Yu , Zhiheng Lyu , Zhuo Jiang , Zibo Gao , Zijia Wu , Zijian Song , Zijun Sun

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks, yet the majority of high-performing models remain closed-source or partially open, limiting transparency and reproducibility. In this work,…

More than 70% of cloud computing is paid for but sits idle. A large fraction of these idle compute are cheap CPUs with few cores that are not utilized during the less busy hours. This paper aims to enable those CPU cycles to train…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-02-01 Minghao Yan , Nicholas Meisburger , Tharun Medini , Anshumali Shrivastava

In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective…

The widespread adoption of cloud computing, edge, and IoT has increased the attack surface for cyber threats. This is due to the large-scale deployment of often unsecured, heterogeneous devices with varying hardware and software…

Cryptography and Security · Computer Science 2024-07-23 Simone Magnani , Liubov Nedoshivina , Roberto Doriguzzi-Corin , Stefano Braghin , Domenico Siracusa

The scaling of large language models has greatly improved natural language understanding, generation, and reasoning. In this work, we develop a system that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors…

State-of-the-art language and vision models are routinely trained across thousands of GPUs, often spanning multiple data-centers, yet today's distributed frameworks still assume reliable connections (e.g., InfiniBand or RoCE). The resulting…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-11 Erez Weintraub , Ron Banner , Ariel Orda
‹ Prev 1 2 3 10 Next ›