English
Related papers

Related papers: CodeComplex: Dataset for Worst-Case Time Complexit…

200 papers

Code reasoning tasks are becoming prevalent in large language model (LLM) assessments. Yet, there is a dearth of studies on the impact of real-world complexities on code reasoning, e.g., inter- or intra-procedural dependencies, API calls,…

Software Engineering · Computer Science 2026-04-27 Changshu Liu , Alireza Ghazanfari , Yang Chen , Reyhaneh Jabbarvand

In recent years, the application of large language models (LLMs) to code-related tasks has gained significant attention. However, existing evaluation benchmarks often focus on limited scenarios, such as code generation or completion, which…

Software Engineering · Computer Science 2024-09-17 Jia Feng , Jiachen Liu , Cuiyun Gao , Chun Yong Chong , Chaozheng Wang , Shan Gao , Xin Xia

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is…

Large Language Models (LLMs) have recently demonstrated strong capabilities in code-related tasks, but their robustness in code reasoning under perturbations remains underexplored. We introduce CodeCrash, a stress-testing framework with…

Artificial Intelligence · Computer Science 2025-10-14 Man Ho Lam , Chaozheng Wang , Jen-tse Huang , Michael R. Lyu

Despite the remarkable success of large language models (LLMs) on traditional natural language processing tasks, their planning ability remains a critical bottleneck in tackling complex multi-step reasoning tasks. Existing approaches mainly…

Computation and Language · Computer Science 2024-10-07 Jiaxin Wen , Jian Guan , Hongning Wang , Wei Wu , Minlie Huang

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

Recent advances in Code Large Language Models (CodeLLMs) have primarily focused on open-ended code generation, often overlooking the crucial aspect of code understanding and reasoning. To bridge this gap, we introduce CodeMMLU, a…

Software Engineering · Computer Science 2025-04-10 Dung Nguyen Manh , Thang Phan Chau , Nam Le Hai , Thong T. Doan , Nam V. Nguyen , Quang Pham , Nghi D. Q. Bui

Large Language Models (LLMs) have been widely used to automate programming tasks. Their capabilities have been evaluated by assessing the quality of generated code through tests or proofs. The extent to which they can reason about code is a…

Software Engineering · Computer Science 2026-04-08 Changshu Liu , Yang Chen , Reyhaneh Jabbarvand

Task automation has been greatly empowered by the recent advances in Large Language Models (LLMs) via Python code, where the tasks ranging from software engineering development to general-purpose reasoning. While current benchmarks have…

With the significant progress of large reasoning models in complex coding and reasoning tasks, existing benchmarks, like LiveCodeBench and CodeElo, are insufficient to evaluate the coding capabilities of large language models (LLMs) in real…

Computation and Language · Computer Science 2025-06-06 Shiyi Xu , Yiwen Hu , Yingqian Min , Zhipeng Chen , Wayne Xin Zhao , Ji-Rong Wen

We introduce LeetCodeDataset, a high-quality benchmark for evaluating and training code-generation models, addressing two key challenges in LLM research: the lack of reasoning-focused coding benchmarks and self-contained training testbeds.…

Machine Learning · Computer Science 2025-04-22 Yunhui Xia , Wei Shen , Yan Wang , Jason Klein Liu , Huifeng Sun , Siyue Wu , Jian Hu , Xiaolong Xu

Code reasoning tasks are increasingly crucial to evaluating large language models (LLMs). Yet most existing benchmarks rely on simplistic, LLM-generated snippets or human-written solutions to code challenges and often restrict inputs and…

Software Engineering · Computer Science 2026-04-15 Changshu Liu

Predicting the complexity of source code is essential for software development and algorithm analysis. Recently, Baik et al. (2025) introduced CodeComplex for code time complexity prediction. The paper shows that LLMs without fine-tuning…

Artificial Intelligence · Computer Science 2025-10-13 Joonghyuk Hahn , Soohan Lim , Yo-Sub Han

Large language models (LLMs) are increasingly explored for their reasoning capabilities, yet their ability to perform structured, constraint-based optimization from natural language remains insufficiently understood. This study evaluates…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-18 Aasish Kumar Sharma , Julian Kunkel

Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning. Recent work suggests that exposure to code can further enhance these…

Machine Learning · Computer Science 2026-01-30 Lukas Twist , Shu Yang , Hanqi Yan , Jingzhi Gong , Di Wang , Helen Yannakoudakis , Jie M. Zhang

The critique capacity of Large Language Models (LLMs) is essential for reasoning abilities, which can provide necessary suggestions (e.g., detailed analysis and constructive feedback). Therefore, how to evaluate the critique capacity of…

Large language models (LLMs) have mastered abundant simple and explicit commonsense knowledge through pre-training, enabling them to achieve human-like performance in simple commonsense reasoning. Nevertheless, LLMs struggle to reason with…

Computation and Language · Computer Science 2025-06-10 Kai Xiong , Xiao Ding , Yixin Cao , Yuxiong Yan , Li Du , Yufei Zhang , Jinglong Gao , Jiaqian Liu , Bing Qin , Ting Liu

Complex reasoning ability is one of the most important features of current LLMs, which has also been leveraged to play an integral role in complex decision-making tasks. Therefore, the investigation into the reasoning capabilities of Large…

Artificial Intelligence · Computer Science 2024-02-13 Lizhou Fan , Wenyue Hua , Lingyao Li , Haoyang Ling , Yongfeng Zhang

Code data has been shown to enhance the reasoning capabilities of large language models (LLMs), but it remains unclear which aspects of code are most responsible. We investigate this question with a systematic, data-centric framework. We…

Computation and Language · Computer Science 2025-10-03 Abdul Waheed , Zhen Wu , Carolyn Rosé , Daphne Ippolito
‹ Prev 1 2 3 10 Next ›