In this paper, we propose a new data synthesis method called \textbf{LogicPro}, which leverages LeetCode-style algorithm \underline{Pro}blems and their corresponding \underline{Pro}gram solutions to synthesize Complex \underline{Logic}al Reasoning data in text format. First, we synthesize complex reasoning problems through source algorithm problems and test cases. Then, standard answers and intermediate variable outputs are obtained for each problem based on standard python solutions and test cases. Finally, with the guidance of code intermediate variables, we synthesize the text reasoning process for each reasoning problems. Through this method, we can synthesize data that is difficult, scalable, effective, and comes with golden standard answers and high-quality reasoning processes. As a result, with our 540K synthesized dataset constructed solely from 2,360 algorithm problems, our approach \footnote{Code and data are publicly available at https://github.com/jiangjin1999/LogicPro} achieves significant improvements in multiple models for the datasets \textit{BBH27}, \textit{LogicBench}, \textit{DROP}, \textit{AR-LSAT}, and \textit{GSM8K}, etc. outperforming a wide range of existing reasoning datasets.
@article{arxiv.2409.12929,
title = {LogicPro: Improving Complex Logical Reasoning via Program-Guided Learning},
author = {Jin Jiang and Yuchen Yan and Yang Liu and Jianing Wang and Shuai Peng and Xunliang Cai and Yixin Cao and Mengdi Zhang and Liangcai Gao},
journal= {arXiv preprint arXiv:2409.12929},
year = {2025}
}
Comments
19 pages, ACL 2025 (Volume 1 Long Papers), pages 26200-26218