ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction

Pengze Li; Jiaqi Liu; Junchi Yu; Lihao Liu; Mingyu Ding; Wanli Ouyang; Shixiang Tang; Xi Chen

ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction

Artificial Intelligence 2025-11-18 v1

Authors: Pengze Li , Jiaqi Liu , Junchi Yu , Lihao Liu , Mingyu Ding , Wanli Ouyang , Shixiang Tang , Xi Chen

Abstract

Large language models (LLMs) are increasingly used in scientific domains. While they can produce reasoning-like content via methods such as chain-of-thought prompting, these outputs are typically unstructured and informal, obscuring whether models truly understand the fundamental reasoning paradigms that underpin scientific inference. To address this, we introduce a novel task named Latent Reasoning Chain Extraction (ARCHE), in which models must decompose complex reasoning arguments into combinations of standard reasoning paradigms in the form of a Reasoning Logic Tree (RLT). In RLT, all reasoning steps are explicitly categorized as one of three variants of Peirce's fundamental inference modes: deduction, induction, or abduction. To facilitate this task, we release ARCHE Bench, a new benchmark derived from 70 Nature Communications articles, including more than 1,900 references and 38,000 viewpoints. We propose two logic-aware evaluation metrics: Entity Coverage (EC) for content completeness and Reasoning Edge Accuracy (REA) for step-by-step logical validity. Evaluations on 10 leading LLMs on ARCHE Bench reveal that models exhibit a trade-off between REA and EC, and none are yet able to extract a complete and standard reasoning chain. These findings highlight a substantial gap between the abilities of current reasoning models and the rigor required for scientific argumentation.

Keywords

automated reasoning large language model reasoning large language model evaluation

Cite

@article{arxiv.2511.12485,
  title  = {ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction},
  author = {Pengze Li and Jiaqi Liu and Junchi Yu and Lihao Liu and Mingyu Ding and Wanli Ouyang and Shixiang Tang and Xi Chen},
  journal= {arXiv preprint arXiv:2511.12485},
  year   = {2025}
}

Comments

Accepted to AAAI 2026

ARCHE: A Novel Task to Evaluate LLMs on Latent Reasoning Chain Extraction

Abstract

Keywords

Cite

Comments

Related papers