Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures

Yu He; Yingxi Li; Colin White; Ellen Vitercik

Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures

Machine Learning 2026-02-12 v3 Artificial Intelligence

Authors: Yu He , Yingxi Li , Colin White , Ellen Vitercik

Abstract

Large language models (LLMs) are deployed on increasingly complex tasks that require multi-step decision-making. Understanding their algorithmic reasoning abilities is therefore crucial. However, we lack a diagnostic benchmark for evaluating this capability. We propose data structures as a principled lens: as fundamental building blocks of algorithms, they naturally probe structural reasoning-the ability to understand and manipulate relationships such as order, hierarchy, and connectivity that underpin algorithmic reasoning. We introduce DSR-Bench, spanning 20 data structures, 35 operations, and 4,140 problem instances. DSR-Bench features hierarchical task organization, fully automated generation and evaluation, and fine-grained diagnostics. Evaluating 13 state-of-the-art LLMs reveals critical limitations: the top-performing model achieves only 0.46/1 on challenging instances. Three auxiliary probes targeting more realistic usages expose further weaknesses: models perform poorly on spatial data and context-rich scenarios, and they struggle to reason over their own code.

Keywords

large language model reasoning large language model evaluation automated reasoning

Cite

@article{arxiv.2505.24069,
  title  = {Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures},
  author = {Yu He and Yingxi Li and Colin White and Ellen Vitercik},
  journal= {arXiv preprint arXiv:2505.24069},
  year   = {2026}
}

Can LLMs Reason Structurally? Benchmarking via the Lens of Data Structures

Abstract

Keywords

Cite

Related papers