Related papers: MEC$^3$O: Multi-Expert Consensus for Code Time Com…

CodeComplex: Dataset for Worst-Case Time Complexity Prediction

Reasoning ability of Large Language Models (LLMs) is a crucial ability, especially in complex decision-making tasks. One significant task to show LLMs' reasoning capability is code time complexity prediction, which involves various…

Software Engineering · Computer Science 2024-12-25 Seung-Yeop Baik , Joonghyuk Hahn , Jungin Kim , Mingi Jeon , Aditi , Yo-Sub Han , Sang-Ki Ko

Learning based Methods for Code Runtime Complexity Prediction

Predicting the runtime complexity of a programming code is an arduous task. In fact, even for humans, it requires a subtle analysis and comprehensive knowledge of algorithms to predict time complexity with high fidelity, given any code. As…

Machine Learning · Computer Science 2019-11-05 Jagriti Sikka , Kushal Satya , Yaman Kumar , Shagun Uppal , Rajiv Ratn Shah , Roger Zimmermann

Rethinking Code Complexity Through the Lens of Large Language Models

Code complexity metrics such as cyclomatic complexity have long been used to assess software quality and maintainability. With the rapid advancement of large language models (LLMs) on coding tasks, an important yet underexplored question…

Software Engineering · Computer Science 2026-05-28 Chen Xie , Xiaodong Gu , Yuling Shi , Beijun Shen

Compute-Accuracy Pareto Frontiers for Open-Source Reasoning Large Language Models

Large Language Models (LLMs) are demonstrating rapid improvements on complex reasoning benchmarks, particularly when allowed to utilize intermediate reasoning steps before converging on a final solution. However, current literature often…

Computation and Language · Computer Science 2026-01-01 Ákos Prucs , Márton Csutora , Mátyás Antal , Márk Marosi

CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings

With the increasing code reasoning capabilities of existing large language models (LLMs) and breakthroughs in reasoning models like OpenAI o1 and o3, there is a growing need to develop more challenging and comprehensive benchmarks that…

Computation and Language · Computer Science 2025-01-06 Shanghaoran Quan , Jiaxi Yang , Bowen Yu , Bo Zheng , Dayiheng Liu , An Yang , Xuancheng Ren , Bofei Gao , Yibo Miao , Yunlong Feng , Zekun Wang , Jian Yang , Zeyu Cui , Yang Fan , Yichang Zhang , Binyuan Hui , Junyang Lin

Enhancing LLM Code Generation with Ensembles: A Similarity-Based Selection Approach

Ensemble learning has been widely used in machine learning to improve model robustness, accuracy, and generalization, but has not yet been applied to code generation tasks with large language models (LLMs). We propose an ensemble approach…

Software Engineering · Computer Science 2025-07-22 Tarek Mahmud , Bin Duan , Corina Pasareanu , Guowei Yang

BigO(Bench) -- Can LLMs Generate Code with Controlled Time and Space Complexity?

We introduce BigO(Bench), a novel coding benchmark designed to evaluate the capabilities of generative language models in understanding and generating code with specified time and space complexities. This benchmark addresses the gap in…

Computation and Language · Computer Science 2025-03-21 Pierre Chambon , Baptiste Roziere , Benoit Sagot , Gabriel Synnaeve

Error Understanding in Program Code With LLM-DL for Multi-label Classification

Programming is a core skill in computer science and software engineering (SE), yet identifying and resolving code errors remains challenging for both novice and experienced developers. While Large Language Models (LLMs) have shown…

Software Engineering · Computer Science 2026-03-27 Md Faizul Ibne Amin , Yutaka Watanobe , Md. Mostafizer Rahman , Daniel M. Muepu , Md. Shahajada Mia

Wisdom and Delusion of LLM Ensembles for Code Generation and Repair

Today's pursuit of a single Large Language Model (LMM) for all software engineering tasks is resource-intensive and overlooks the potential benefits of complementarity, where different models contribute unique strengths. However, the degree…

Software Engineering · Computer Science 2025-10-31 Fernando Vallecillos-Ruiz , Max Hort , Leon Moonen

ACCORD: Closing the Commonsense Measurability Gap

We present ACCORD, a framework and benchmark suite for disentangling the commonsense grounding and reasoning abilities of large language models (LLMs) through controlled, multi-hop counterfactuals. ACCORD introduces formal elements to…

Artificial Intelligence · Computer Science 2025-02-10 François Roewer-Després , Jinyue Feng , Zining Zhu , Frank Rudzicz

Compositional Hardness of Code in Large Language Models -- A Probabilistic Perspective

A common practice in large language model (LLM) usage for complex analytical tasks such as code generation, is to sample a solution for the entire task within the model's context window. Previous works have shown that subtask decomposition…

Artificial Intelligence · Computer Science 2025-02-03 Yotam Wolf , Binyamin Rothberg , Dorin Shteyman , Amnon Shashua

Not All Code Is Equal: A Data-Centric Study of Code Complexity and LLM Reasoning

Large Language Models (LLMs) increasingly exhibit strong reasoning abilities, often attributed to their capacity to generate chain-of-thought-style intermediate reasoning. Recent work suggests that exposure to code can further enhance these…

Machine Learning · Computer Science 2026-01-30 Lukas Twist , Shu Yang , Hanqi Yan , Jingzhi Gong , Di Wang , Helen Yannakoudakis , Jie M. Zhang

MeCo: Enhancing LLM-Empowered Multi-Robot Collaboration via Similar Task Memoization

Multi-robot systems have been widely deployed in real-world applications, providing significant improvements in efficiency and reductions in labor costs. However, most existing multi-robot collaboration methods rely on extensive…

Robotics · Computer Science 2026-02-16 Baiqing Wang , Helei Cui , Bo Zhang , Xiaolong Zheng , Bin Guo , Zhiwen Yu

Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency.…

Machine Learning · Computer Science 2025-03-25 Codefuse , Ling Team , : , Wenting Cai , Yuchen Cao , Chaoyu Chen , Chen Chen , Siba Chen , Qing Cui , Peng Di , Junpeng Fang , Zi Gong , Ting Guo , Zhengyu He , Yang Huang , Cong Li , Jianguo Li , Zheng Li , Shijie Lian , BingChang Liu , Songshan Luo , Shuo Mao , Min Shen , Jian Wu , Jiaolong Yang , Wenjie Yang , Tong Ye , Hang Yu , Wei Zhang , Zhenduo Zhang , Hailin Zhao , Xunjin Zheng , Jun Zhou

Evaluating and Achieving Controllable Code Completion in Code LLM

Code completion has become a central task, gaining significant attention with the rise of large language model (LLM)-based tools in software engineering. Although recent advances have greatly improved LLMs' code completion abilities,…

Software Engineering · Computer Science 2026-01-23 Jiajun Zhang , Zeyu Cui , Lei Zhang , Jian Yang , Jiaxi Yang , Qiang Liu , Zilei Wang , Binyuan Hui , Liang Wang , Junyang Lin

Using LLMs for Knowledge Component-level Correctness Labeling in Open-ended Coding Problems

Fine-grained skill representations, commonly referred to as knowledge components (KCs), are fundamental to many approaches in student modeling and learning analytics. However, KC-level correctness labels are rarely available in real-world…

Computation and Language · Computer Science 2026-03-31 Zhangqi Duan , Arnav Kankaria , Dhruv Kartik , Andrew Lan

AdaptiveLLM: A Framework for Selecting Optimal Cost-Efficient LLM for Code-Generation Based on CoT Length

While Large Language Models (LLMs) have significantly advanced code generation efficiency, they face inherent challenges in balancing performance and inference costs across diverse programming tasks. Dynamically selecting the optimal LLM…

Software Engineering · Computer Science 2025-06-13 Junhang Cheng , Fang Liu , Chengru Wu , Li Zhang

Style Over Substance: Evaluation Biases for Large Language Models

As large language models (LLMs) continue to advance, accurately and comprehensively evaluating their performance becomes increasingly challenging. Ranking the relative performance of LLMs based on Elo ratings, according to human judgment,…

Computation and Language · Computer Science 2023-11-14 Minghao Wu , Alham Fikri Aji

Multi-Turn Reasoning LLMs for Task Offloading in Mobile Edge Computing

Emerging computation-intensive applications impose stringent latency requirements on resource-constrained mobile devices. Mobile Edge Computing (MEC) addresses this challenge through task offloading. However, designing effective policies…

Machine Learning · Computer Science 2026-04-09 Ning Yang , Chuangxin Cheng , Haijun Zhang

Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective

There is a growing interest in leveraging multiple large language models (LLMs) for automated code optimization. However, industrial platforms deploying multiple LLMs face a critical challenge: prompts optimized for one LLM often fail with…

Software Engineering · Computer Science 2025-10-06 Jingzhi Gong , Rafail Giavrimis , Paul Brookes , Vardan Voskanyan , Fan Wu , Mari Ashiga , Matthew Truscott , Mike Basios , Leslie Kanthan , Jie Xu , Zheng Wang