English
Related papers

Related papers: OOP: Object-Oriented Programming Evaluation Benchm…

200 papers

Establishing fair and robust benchmarks is essential for evaluating intelligent code generation by large language models (LLMs). Our survey of 35 existing benchmarks uncovers three major imbalances: 85.7% focus on a single programming…

Software Engineering · Computer Science 2025-10-01 Shuai Wang , Liang Ding , Li Shen , Yong Luo , Han Hu , Lefei Zhang , Fu Lin

Code generation benchmarks such as HumanEval are widely adopted to evaluate LLMs' capabilities. However, after consolidating the latest 24 benchmarks, we noticed three significant imbalances. First, imbalanced programming language. 95.8% of…

Machine Learning · Computer Science 2024-10-14 Jialun Cao , Zhiyong Chen , Jiarong Wu , Shing-chi Cheung , Chang Xu

Recent advances in large language models (LLMs) have driven extensive evaluations in software engineering. however, most prior work concentrates on code-level tasks, leaving software design capabilities underexplored. To fill this gap, we…

Software Engineering · Computer Science 2026-03-12 Bingxu Xiao , Yunwei Dong , Yiqi Tang , Manqing Zhang , Yifan Zhou , Chunyan Ma , Yepang Liu

We find ourselves in the midst of an explosion in artificial intelligence research, particularly with large language models (LLMs). These models have diverse applications spanning finance, commonsense knowledge graphs, medicine, and visual…

Software Engineering · Computer Science 2025-08-08 Gang Xu , Airong Wang , Yushan Pan

Large Language Models (LLMs) have emerged as promising tools to assist students while solving programming assignments. However, object-oriented programming (OOP), with its inherent complexity involving the identification of entities,…

Software Engineering · Computer Science 2024-03-12 Bruno Pereira Cipriano , Pedro Alves

Recent advancements in large language models (LLMs) have greatly improved code generation, specifically at the function level. For instance, GPT-4o has achieved a 91.0\% pass rate on HumanEval. However, this draws into question the adequacy…

Computation and Language · Computer Science 2025-08-19 Jianbo Dai , Jianqiao Lu , Yunlong Feng , Guangtao Zeng , Rongju Ruan , Ming Cheng , Dong Huang , Haochen Tan , Zhijiang Guo

In the area of code generation research, the emphasis has transitioned from crafting individual functions to developing class-level method code that integrates contextual information. This shift has brought several benchmarks such as…

Software Engineering · Computer Science 2024-08-28 Zinan Wang

Evaluating the programming robustness of large language models (LLMs) is paramount for ensuring their reliability in AI-based software development. However, adversarial attacks exhibit fundamental limitations that compromise fair robustness…

Software Engineering · Computer Science 2026-02-17 Sen Fang , Weiyuan Ding , Mengshi Zhang , Zihao Chen , Bowen Xu

Object-Oriented Programming (OOP) has become a crucial paradigm for managing the growing complexity of modern software systems, particularly in fields like machine learning, deep learning, large language models (LLM), and data analytics.…

Computation and Language · Computer Science 2025-12-24 Tianyang Wang , Ziqian Bi , Keyu Chen , Jiawei Xu , Qian Niu , Junyu Liu , Benji Peng , Ming Li , Sen Zhang , Xuanhe Pan , Jinlang Wang , Pohsun Feng , Yizhu Wen , Xinyuan Song , Ming Liu

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

Growing renewable penetration introduces substantial uncertainty into power system operations, necessitating frequent adaptation of dispatch objectives and constraints and challenging expertise-intensive, near-real-time modeling workflows.…

Systems and Control · Electrical Eng. & Systems 2026-05-25 Chao Shen , Zihan Guo , Xu Wan , Zhenghao Yang , Yifan Zhang , Wengi Huang , Jie Song , Zongyan Zhang , Mingyang Sun

Information Technology (IT) Operations (Ops), particularly Artificial Intelligence for IT Operations (AIOps), is the guarantee for maintaining the orderly and stable operation of existing information systems. According to Gartner's…

The rapid progress of artificial intelligence increasingly relies on efficient integrated circuit (IC) design. Recent studies have explored the use of large language models (LLMs) for generating Register Transfer Level (RTL) code, but…

Artificial Intelligence · Computer Science 2026-01-06 Yao Lu , Shang Liu , Hangan Zhou , Wenji Fang , Qijun Zhang , Zhiyao Xie

Large language models (LLMs) have increasingly been applied to automatic programming code generation. This task can be viewed as a language generation task that bridges natural language, human knowledge, and programming logic. However, it…

Finetuning large language models (LLMs) on instructions leads to vast performance improvements on natural language tasks. We apply instruction tuning using code, leveraging the natural structure of Git commits, which pair code changes with…

We introduce self-invoking code generation, a new task designed to evaluate the progressive reasoning and problem-solving capabilities of LLMs. In this task, models are presented with a base problem and a related, more complex problem. They…

Software Engineering · Computer Science 2025-01-03 Zhaojian Yu , Yilun Zhao , Arman Cohan , Xiao-Ping Zhang

Driven by the surge in code generation using large language models (LLMs), numerous benchmarks have emerged to evaluate these LLMs capabilities. We conducted a large-scale human evaluation of HumanEval and MBPP, two popular benchmarks for…

Computation and Language · Computer Science 2024-07-08 Ankit Yadav , Himanshu Beniwal , Mayank Singh

Recently, a number of repository-level code generation benchmarks-such as CoderEval, DevEval, RepoEval, RepoBench, and LongCodeArena-have emerged to evaluate the capabilities of large language models (LLMs) beyond standalone benchmarks like…

Software Engineering · Computer Science 2025-06-26 Shanchao Liang , Yiran Hu , Nan Jiang , Lin Tan

Multi-objective optimization problems (MOPs) are ubiquitous in real-world applications, presenting a complex challenge of balancing multiple conflicting objectives. Traditional evolutionary algorithms (EAs), though effective, often rely on…

Neural and Evolutionary Computing · Computer Science 2024-07-29 Yuxiao Huang , Shenghao Wu , Wenjie Zhang , Jibin Wu , Liang Feng , Kay Chen Tan

The application of large language models (LLMs) in the field of coding is evolving rapidly: from code assistants, to autonomous coding agents, and then to generating complete projects through natural language. Early LLM code benchmarks…

Artificial Intelligence · Computer Science 2025-05-13 Kai Xu , YiWei Mao , XinYi Guan , ZiLong Feng
‹ Prev 1 2 3 10 Next ›