GLM-OCR Technical Report

Shuaiqi Duan; Yadong Xue; Weihan Wang; Zhe Su; Huan Liu; Sheng Yang; Guobing Gan; Guo Wang; Zihan Wang; Shengdong Yan; Dexin Jin; Yuxuan Zhang; Guohong Wen; Yanfeng Wang; Yutao Zhang; Xiaohan Zhang; Wenyi Hong; Yukuo Cen; Da Yin; Bin Chen; Wenmeng Yu; Xiaotao Gu; Jie Tang

GLM-OCR Technical Report

Computation and Language 2026-03-17 v2

Authors: Shuaiqi Duan , Yadong Xue , Weihan Wang , Zhe Su , Huan Liu , Sheng Yang , Guobing Gan , Guo Wang , Zihan Wang , Shengdong Yan , Dexin Jin , Yuxuan Zhang , Guohong Wen , Yanfeng Wang , Yutao Zhang , Xiaohan Zhang , Wenyi Hong , Yukuo Cen , Da Yin , Bin Chen , Wenmeng Yu , Xiaotao Gu , Jie Tang

View on arXiv ↗ PDF ↗

Abstract

GLM-OCR is an efficient 0.9B-parameter compact multimodal model designed for real-world document understanding. It combines a 0.4B-parameter CogViT visual encoder with a 0.5B-parameter GLM language decoder, achieving a strong balance between computational efficiency and recognition performance. To address the inefficiency of standard autoregressive decoding in deterministic OCR tasks, GLM-OCR introduces a Multi-Token Prediction (MTP) mechanism that predicts multiple tokens per step, significantly improving decoding throughput while keeping memory overhead low through shared parameters. At the system level, a two-stage pipeline is adopted: PP-DocLayout-V3 first performs layout analysis, followed by parallel region-level recognition. Extensive evaluations on public benchmarks and industrial scenarios show that GLM-OCR achieves competitive or state-of-the-art performance in document parsing, text and formula transcription, table structure recovery, and key information extraction. Its compact architecture and structured generation make it suitable for both resource-constrained edge deployment and large-scale production systems.

Keywords

code generation large language model handwritten character recognition

Cite

@article{arxiv.2603.10910,
  title  = {GLM-OCR Technical Report},
  author = {Shuaiqi Duan and Yadong Xue and Weihan Wang and Zhe Su and Huan Liu and Sheng Yang and Guobing Gan and Guo Wang and Zihan Wang and Shengdong Yan and Dexin Jin and Yuxuan Zhang and Guohong Wen and Yanfeng Wang and Yutao Zhang and Xiaohan Zhang and Wenyi Hong and Yukuo Cen and Da Yin and Bin Chen and Wenmeng Yu and Xiaotao Gu and Jie Tang},
  journal= {arXiv preprint arXiv:2603.10910},
  year   = {2026}
}

GLM-OCR Technical Report

Abstract

Keywords

Cite

Related papers