English
Related papers

Related papers: Benchmarking Educational Program Repair

200 papers

This paper describes our approach to automated program repair. We combine various techniques from the literature to achieve this. Our experiments show that our approach performs better than other techniques on standard benchmarks. However,…

Software Engineering · Computer Science 2025-08-25 Mahinthan Chandramohan , Jovan Jancic , Yuntong Zhang , Padmanabhan Krishnan

With the rapid development of Large Language Models (LLMs), a large number of machine learning models have been developed to assist programming tasks including the generation of program code from natural language input. However, how to…

Artificial Intelligence · Computer Science 2024-06-19 Debalina Ghosh Paul , Hong Zhu , Ian Bayley

We present a new approach for benchmarking Large Language Model (LLM) capabilities on research-level mathematics. Existing benchmarks largely rely on static, hand-curated sets of contest or textbook-style problems as proxies for…

Artificial Intelligence · Computer Science 2026-03-02 Antoine Peyronnet , Fabian Gloeckle , Amaury Hayat

Large language models (LLMs) are gaining increasing popularity in software engineering (SE) due to their unprecedented performance across various applications. These models are increasingly being utilized for a range of SE tasks, including…

Software Engineering · Computer Science 2025-11-05 Xing Hu , Feifei Niu , Junkai Chen , Xin Zhou , Junwei Zhang , Junda He , Xin Xia , David Lo

Existing benchmarks for evaluating mathematical reasoning in large language models (LLMs) rely primarily on competition problems, formal proofs, or artificially challenging questions -- failing to capture the nature of mathematics…

Artificial Intelligence · Computer Science 2025-10-21 Jie Zhang , Cezara Petrui , Kristina Nikolić , Florian Tramèr

During migration across instruction set architectures (ISAs), software package build repair is a critical task for ensuring the reliability of software deployment and the stability of modern operating systems. While Large Language Models…

Machine learning (ML) now pervades the field of Automated Program Repair (APR). Algorithms deploy neural machine translation and large language models (LLMs) to generate software patches, among other tasks. But, there are important…

Software Engineering · Computer Science 2024-05-10 Joseph Renzullo , Pemma Reiter , Westley Weimer , Stephanie Forrest

Large language models~(LLMs) have greatly advanced the frontiers of artificial intelligence, attaining remarkable improvement in model capacity. To assess the model performance, a typical approach is to construct evaluation benchmarks for…

Computation and Language · Computer Science 2023-11-06 Kun Zhou , Yutao Zhu , Zhipeng Chen , Wentong Chen , Wayne Xin Zhao , Xu Chen , Yankai Lin , Ji-Rong Wen , Jiawei Han

The rapid proliferation of benchmarks for evaluating large language models (LLMs) has created an urgent need for systematic methods to assess benchmark quality itself. We propose Benchmark^2, a comprehensive framework comprising three…

The pursuit of leaderboard rankings in Large Language Models (LLMs) has created a fundamental paradox: models excel at standardized tests while failing to demonstrate genuine language understanding and adaptability. Our systematic analysis…

Computation and Language · Computer Science 2024-12-06 Sourav Banerjee , Ayushi Agarwal , Eishkaran Singh

Large language models (LLMs) are becoming increasingly better at a wide range of Natural Language Processing tasks (NLP), such as text generation and understanding. Recently, these models have extended their capabilities to coding tasks,…

Machine Learning · Computer Science 2024-10-23 Nishat Raihan , Mohammed Latif Siddiq , Joanna C. S. Santos , Marcos Zampieri

Novice programmers benefit from timely, personalized support that addresses individual learning gaps, yet the availability of instructors and teaching assistants is inherently limited. Large language models (LLMs) present opportunities to…

Computers and Society · Computer Science 2025-10-07 Griffin Pitts , Anurata Prabha Hridi , Arun-Balajiee Lekshmi-Narayanan

Large language models (LLMs) are powerful tools capable of handling diverse tasks. Comparing and selecting appropriate LLMs for specific tasks requires systematic evaluation methods, as models exhibit varying capabilities across different…

Computation and Language · Computer Science 2025-06-04 Anna Sokol , Elizabeth Daly , Michael Hind , David Piorkowski , Xiangliang Zhang , Nuno Moniz , Nitesh Chawla

This paper investigates supervised fine-tuning of large language models (LLMs) to improve their pedagogical alignment in computing education, addressing concerns that LLMs may hinder learning outcomes. The project utilised a proprietary…

Computation and Language · Computer Science 2024-11-05 Alexandra Vassar , Jake Renzella , Emily Ross , Andrew Taylor

The increasing prevalence of software bugs has made automated program repair (APR) a key research focus. Large language models (LLMs) offer new opportunities for APR, but existing studies mostly rely on smaller, earlier-generation models…

Software Engineering · Computer Science 2025-06-17 Jiajun Sun , Fengjie Li , Xinzhu Qi , Hongyu Zhang , Jiajun Jiang

Excel is a pervasive yet often complex tool, particularly for novice users, where runtime errors arising from logical mistakes or misinterpretations of functions pose a significant challenge. While large language models (LLMs) offer…

Use cases are widely employed to specify functional requirements, yet existing benchmarks are scarce and face the risk of being misaligned with actual system behavior, similarly limiting the rigorous evaluation of large language models…

Software Engineering · Computer Science 2025-12-16 Shuyuan Xiao , Yiran Zhang , Weisong Sun , Xiaohong Chen , Yang Liu , Zhi Jin

The era of large language models (LLM) raises questions not only about how to train models, but also about how to evaluate them. Despite numerous existing benchmarks, insufficient attention is often given to creating assessments that test…

Large Language Models (LLMs) often produce code with subtle implementation-level bugs despite strong benchmark performance. These errors are hard for LLMs to spot and can have large behavioural effects; yet when asked to summarise code,…

Software Engineering · Computer Science 2025-11-25 Lukas Twist

The advancement of large language models (LLMs) has led to a greater challenge of having a rigorous and systematic evaluation of complex tasks performed, especially in enterprise applications. Therefore, LLMs need to be able to benchmark…

Computation and Language · Computer Science 2024-10-18 Bing Zhang , Mikio Takeuchi , Ryo Kawahara , Shubhi Asthana , Md. Maruf Hossain , Guang-Jie Ren , Kate Soule , Yada Zhu
‹ Prev 1 2 3 10 Next ›