Related papers: Xmodel-2 Technical Report

Xmodel-LM Technical Report

We introduce Xmodel-LM, a compact and efficient 1.1B language model pre-trained on around 2 trillion tokens. Trained on our self-built dataset (Xdata), which balances Chinese and English corpora based on downstream task optimization,…

Computation and Language · Computer Science 2024-11-20 Yichuan Wang , Yang Liu , Yu Yan , Qun Wang , Xucheng Huang , Ling Jiang

Xmodel-2.5: 1.3B Data-Efficient Reasoning SLM

Large language models deliver strong reasoning and tool-use skills, yet their computational demands make them impractical for edge or cost-sensitive deployments. We present \textbf{Xmodel-2.5}, a 1.3-billion-parameter small language model…

Machine Learning · Computer Science 2025-11-26 Yang Liu , Xiaolong Zhong , Ling Jiang

SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model

While large language models have facilitated breakthroughs in many applications of artificial intelligence, their inherent largeness makes them computationally expensive and challenging to deploy in resource-constrained settings. In this…

Computation and Language · Computer Science 2025-02-06 Loubna Ben Allal , Anton Lozhkov , Elie Bakouch , Gabriel Martín Blázquez , Guilherme Penedo , Lewis Tunstall , Andrés Marafioti , Hynek Kydlíček , Agustín Piqueres Lajarín , Vaibhav Srivastav , Joshua Lochner , Caleb Fahlgren , Xuan-Son Nguyen , Clémentine Fourrier , Ben Burtenshaw , Hugo Larcher , Haojun Zhao , Cyril Zakka , Mathieu Morlon , Colin Raffel , Leandro von Werra , Thomas Wolf

Unlocking Reasoning Potential in Large Langauge Models by Scaling Code-form Planning

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly…

Computation and Language · Computer Science 2024-10-07 Jiaxin Wen , Jian Guan , Hongning Wang , Wei Wu , Minlie Huang

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing…

Computation and Language · Computer Science 2025-06-06 LLM-Core Xiaomi , : , Bingquan Xia , Bowen Shen , Cici , Dawei Zhu , Di Zhang , Gang Wang , Hailin Zhang , Huaqiu Liu , Jiebao Xiao , Jinhao Dong , Liang Zhao , Peidian Li , Peng Wang , Shihua Yu , Shimao Chen , Weikun Wang , Wenhan Ma , Xiangwei Deng , Yi Huang , Yifan Song , Zihan Jiang , Bowen Ye , Can Cai , Chenhong He , Dong Zhang , Duo Zhang , Guoan Wang , Hao Tian , Haochen Zhao , Heng Qu , Hongshen Xu , Jun Shi , Kainan Bao , Kai Fang , Kang Zhou , Kangyang Zhou , Lei Li , Menghang Zhu , Nuo Chen , Qiantong Wang , Shaohui Liu , Shicheng Li , Shuhao Gu , Shuhuai Ren , Shuo Liu , Sirui Deng , Weiji Zhuang , Weiwei Lv , Wenyu Yang , Xin Zhang , Xing Yong , Xing Zhang , Xingchen Song , Xinzhe Xu , Xu Wang , Yihan Yan , Yu Tu , Yuanyuan Tian , Yudong Wang , Yue Yu , Zhenru Lin , Zhichao Song , Zihao Yue

Tele-FLM Technical Report

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on…

Computation and Language · Computer Science 2024-04-26 Xiang Li , Yiqun Yao , Xin Jiang , Xuezhi Fang , Chao Wang , Xinzhang Liu , Zihan Wang , Yu Zhao , Xin Wang , Yuyao Huang , Shuangyong Song , Yongxiang Li , Zheng Zhang , Bo Zhao , Aixin Sun , Yequan Wang , Zhongjiang He , Zhongyuan Wang , Xuelong Li , Tiejun Huang

Xmodel-1.5: An 1B-scale Multilingual LLM

We introduce Xmodel-1.5, a 1-billion-parameter multilingual large language model pretrained on 2 trillion tokens, designed for balanced performance and scalability. Unlike most large models that use the BPE tokenizer, Xmodel-1.5 employs a…

Computation and Language · Computer Science 2024-12-05 Wang Qun , Liu Yang , Lin Qingquan , Jiang Ling

Baichuan 2: Open Large-scale Language Models

Large language models (LLMs) have demonstrated remarkable performance on a variety of natural language tasks based on just a few examples of natural language instructions, reducing the need for extensive feature engineering. However, most…

Computation and Language · Computer Science 2025-04-18 Aiyuan Yang , Bin Xiao , Bingning Wang , Borong Zhang , Ce Bian , Chao Yin , Chenxu Lv , Da Pan , Dian Wang , Dong Yan , Fan Yang , Fei Deng , Feng Wang , Feng Liu , Guangwei Ai , Guosheng Dong , Haizhou Zhao , Hang Xu , Haoze Sun , Hongda Zhang , Hui Liu , Jiaming Ji , Jian Xie , JunTao Dai , Kun Fang , Lei Su , Liang Song , Lifeng Liu , Liyun Ru , Luyao Ma , Mang Wang , Mickel Liu , MingAn Lin , Nuolan Nie , Peidong Guo , Ruiyang Sun , Tao Zhang , Tianpeng Li , Tianyu Li , Wei Cheng , Weipeng Chen , Xiangrong Zeng , Xiaochuan Wang , Xiaoxi Chen , Xin Men , Xin Yu , Xuehai Pan , Yanjun Shen , Yiding Wang , Yiyu Li , Youxin Jiang , Yuchen Gao , Yupeng Zhang , Zenan Zhou , Zhiying Wu

LLM2: Let Large Language Models Harness System 2 Reasoning

Large language models (LLMs) have exhibited impressive capabilities across a myriad of tasks, yet they occasionally yield undesirable outputs. We posit that these limitations are rooted in the foundational autoregressive architecture of…

Computation and Language · Computer Science 2025-03-03 Cheng Yang , Chufan Shi , Siheng Li , Bo Shui , Yujiu Yang , Wai Lam

MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code

Code has been shown to be effective in enhancing the mathematical reasoning abilities of large language models due to its precision and accuracy. Previous works involving continued mathematical pretraining often include code that utilizes…

Computation and Language · Computer Science 2024-10-11 Zimu Lu , Aojun Zhou , Ke Wang , Houxing Ren , Weikang Shi , Junting Pan , Mingjie Zhan , Hongsheng Li

Training Language Models to Reason Efficiently

Scaling model size and training data has led to great advances in the performance of Large Language Models (LLMs). However, the diminishing returns of this approach necessitate alternative methods to improve model capabilities, particularly…

Machine Learning · Computer Science 2025-11-05 Daman Arora , Andrea Zanette

Specializing Smaller Language Models towards Multi-Step Reasoning

The surprising ability of Large Language Models (LLMs) to perform well on complex reasoning with only few-shot chain-of-thought prompts is believed to emerge only in very large-scale models (100+ billion parameters). We show that such…

Computation and Language · Computer Science 2023-01-31 Yao Fu , Hao Peng , Litu Ou , Ashish Sabharwal , Tushar Khot

OpenCodeReasoning-II: A Simple Test Time Scaling Approach via Self-Critique

Recent advancements in reasoning-based Large Language Models (LLMs), particularly their potential through test-time scaling, have created significant opportunities for distillation in code generation and critique. However, progress in both…

Computation and Language · Computer Science 2025-07-15 Wasi Uddin Ahmad , Somshubra Majumdar , Aleksander Ficek , Sean Narenthiran , Mehrzad Samadi , Jocelyn Huang , Siddhartha Jain , Vahid Noroozi , Boris Ginsburg

K2-Think: A Parameter-Efficient Reasoning System

K2-Think is a reasoning system that achieves state-of-the-art performance with a 32B parameter model, matching or surpassing much larger models like GPT-OSS 120B and DeepSeek v3.1. Built on the Qwen2.5 base model, our system shows that…

Machine Learning · Computer Science 2025-09-16 Zhoujun Cheng , Richard Fan , Shibo Hao , Taylor W. Killian , Haonan Li , Suqi Sun , Hector Ren , Alexander Moreno , Daqian Zhang , Tianjun Zhong , Yuxin Xiong , Yuanzhe Hu , Yutao Xie , Xudong Han , Yuqi Wang , Varad Pimpalkhute , Yonghao Zhuang , Aaryamonvikram Singh , Xuezhi Liang , Anze Xie , Jianshu She , Desai Fan , Chengqian Gao , Liqun Ma , Mikhail Yurochkin , John Maggs , Xuezhe Ma , Guowei He , Zhiting Hu , Zhengzhong Liu , Eric P. Xing

CPM-2: Large-scale Cost-effective Pre-trained Language Models

In recent years, the size of pre-trained language models (PLMs) has grown by leaps and bounds. However, efficiency issues of these large-scale PLMs limit their utilization in real-world scenarios. We present a suite of cost-effective…

Computation and Language · Computer Science 2021-06-25 Zhengyan Zhang , Yuxian Gu , Xu Han , Shengqi Chen , Chaojun Xiao , Zhenbo Sun , Yuan Yao , Fanchao Qi , Jian Guan , Pei Ke , Yanzheng Cai , Guoyang Zeng , Zhixing Tan , Zhiyuan Liu , Minlie Huang , Wentao Han , Yang Liu , Xiaoyan Zhu , Maosong Sun

Prompt2Model: Generating Deployable Models from Natural Language Instructions

Large language models (LLMs) enable system builders today to create competent NLP systems through prompting, where they only need to describe the task in natural language and provide a few examples. However, in other ways, LLMs are a step…

Computation and Language · Computer Science 2023-08-24 Vijay Viswanathan , Chenyang Zhao , Amanda Bertsch , Tongshuang Wu , Graham Neubig

LLaMAX2: Your Translation-Enhanced Model also Performs Well in Reasoning

General Large Language Models (LLMs) excel in reasoning, but those enhanced for translation struggle with reasoning tasks. To address this, we propose a novel translationenhanced recipe that begins with instruct models and applies…

Computation and Language · Computer Science 2025-10-13 Changjiang Gao , Zixian Huang , Jingyang Gong , Shujian Huang , Lei Li , Fei Yuan

Large Language Models Meet NL2Code: A Survey

The task of generating code from a natural language description, or NL2Code, is considered a pressing and significant challenge in code intelligence. Thanks to the rapid development of pre-training techniques, surging large language models…

Software Engineering · Computer Science 2023-05-09 Daoguang Zan , Bei Chen , Fengji Zhang , Dianjie Lu , Bingchao Wu , Bei Guan , Yongji Wang , Jian-Guang Lou

SEED-Bench-2: Benchmarking Multimodal Large Language Models

Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Bohao Li , Yuying Ge , Yixiao Ge , Guangzhi Wang , Rui Wang , Ruimao Zhang , Ying Shan

From System 1 to System 2: A Survey of Reasoning Large Language Models

Achieving human-level intelligence requires refining the transition from the fast, intuitive System 1 to the slower, more deliberate System 2 reasoning. While System 1 excels in quick, heuristic decisions, System 2 relies on logical…

Artificial Intelligence · Computer Science 2025-06-26 Zhong-Zhi Li , Duzhen Zhang , Ming-Liang Zhang , Jiaxin Zhang , Zengyan Liu , Yuxuan Yao , Haotian Xu , Junhao Zheng , Pei-Jie Wang , Xiuyi Chen , Yingying Zhang , Fei Yin , Jiahua Dong , Zhiwei Li , Bao-Long Bi , Ling-Rui Mei , Junfeng Fang , Xiao Liang , Zhijiang Guo , Le Song , Cheng-Lin Liu