English

Efficient Reasoning with Hidden Thinking

Computation and Language 2026-05-05 v2 Artificial Intelligence Machine Learning

Abstract

Chain-of-Thought (CoT) reasoning has become a powerful framework for improving complex problem-solving capabilities in Multimodal Large Language Models (MLLMs). However, the verbose nature of textual reasoning introduces significant inefficiencies. In this work, we propose Heima (as hidden llama), an effective CoT compression framework that condenses lengthy CoTs into a small set of abstract thinking tokens, preserving essential reasoning while removing redundancy. We then conduct a theoretical analysis from an information-theoretic perspective, quantifying the information gap induced by compression, showing that reasoning capability is preserved when non-trivial mutual information is retained. To further explore and quantify this information gap, we design the adaptive interpreter that maps thinking tokens back to variable-length textual sequences, thereby reconstructing the reasoning process. Experiments across diverse reasoning benchmarks demonstrate that Heima improves reasoning efficiency, while maintaining or even achieving better zero-shot accuracy. Moreover, the interpreter reconstructs coherent reasoning progresses from compressed thinking tokens, revealing that the information gap is minimal and validating the effectiveness of the proposed framework. This work paves the way for scalable latent reasoning models and advances our understanding of efficient reasoning processes in large models. Code: https://github.com/shawnricecake/Heima

Keywords

Cite

@article{arxiv.2501.19201,
  title  = {Efficient Reasoning with Hidden Thinking},
  author = {Xuan Shen and Yizhou Wang and Yufa Zhou and Xiangxi Shi and Pu Zhao and Yanzhi Wang and Jiuxiang Gu},
  journal= {arXiv preprint arXiv:2501.19201},
  year   = {2026}
}

Comments

ICML 2026

R2 v1 2026-06-28T21:27:45.695Z