Related papers: Comparing large language models and human programm…

Generative AI for Programming Education: Benchmarking ChatGPT, GPT-4, and Human Tutors

Generative AI and large language models hold great promise in enhancing computing education by powering next-generation educational technologies for introductory programming. Recent works have studied these models for different scenarios…

Computers and Society · Computer Science 2023-08-02 Tung Phung , Victor-Alexandru Pădurean , José Cambronero , Sumit Gulwani , Tobias Kohn , Rupak Majumdar , Adish Singla , Gustavo Soares

Prompt Engineering or Fine-Tuning: An Empirical Assessment of LLMs for Code

The rapid advancements in large language models (LLMs) have greatly expanded the potential for automated code-related tasks. Two primary methodologies are used in this domain: prompt engineering and fine-tuning. Prompt engineering involves…

Software Engineering · Computer Science 2025-02-21 Jiho Shin , Clark Tang , Tahmineh Mohati , Maleknaz Nayebi , Song Wang , Hadi Hemmati

AI-assisted coding: Experiments with GPT-4

Artificial intelligence (AI) tools based on large language models have acheived human-level performance on some computer programming tasks. We report several experiments using GPT-4 to generate computer code. These experiments demonstrate…

Artificial Intelligence · Computer Science 2023-04-27 Russell A Poldrack , Thomas Lu , Gašper Beguš

Comparing Human and LLM Generated Code: The Jury is Still Out!

Much is promised in relation to AI-supported software development. However, there has been limited evaluation effort in the research domain aimed at validating the true utility of such techniques, especially when compared to human coding…

Software Engineering · Computer Science 2025-01-29 Sherlock A. Licorish , Ansh Bajpai , Chetan Arora , Fanyu Wang , Kla Tantithamthavorn

Assessing the Code Clone Detection Capability of Large Language Models

This study aims to assess the performance of two advanced Large Language Models (LLMs), GPT-3.5 and GPT-4, in the task of code clone detection. The evaluation involves testing the models on a variety of code pairs of different clone types…

Software Engineering · Computer Science 2024-07-03 Zixian Zhang , Takfarinas Saber

OpenAi's GPT4 as coding assistant

Lately, Large Language Models have been widely used in code generation. GPT4 is considered the most potent Large Language Model from Openai. In this paper, we examine GPT3.5 and GPT4 as coding assistants. More specifically, we have…

Artificial Intelligence · Computer Science 2023-09-25 Lefteris Moussiades , George Zografos

Holistic Evaluation of State-of-the-Art LLMs for Code Generation

This study presents a comprehensive empirical evaluation of six state-of-the-art large language models (LLMs) for code generation, including both general-purpose and code-specialized models. Using a dataset of 944 real-world LeetCode…

Software Engineering · Computer Science 2025-12-23 Le Zhang , Suresh Kothari

Can large language models replace humans in the systematic review process? Evaluating GPT-4's efficacy in screening and extracting data from peer-reviewed and grey literature in multiple languages

Systematic reviews are vital for guiding practice, research, and policy, yet they are often slow and labour-intensive. Large language models (LLMs) could offer a way to speed up and automate systematic reviews, but their performance in such…

Computation and Language · Computer Science 2024-04-11 Qusai Khraisha , Sophie Put , Johanna Kappenberg , Azza Warraitch , Kristin Hadfield

Evaluating the Application of Large Language Models to Generate Feedback in Programming Education

This study investigates the application of large language models, specifically GPT-4, to enhance programming education. The research outlines the design of a web application that uses GPT-4 to provide feedback on programming tasks, without…

Computation and Language · Computer Science 2024-07-19 Sven Jacobs , Steffen Jaschke

Testing LLMs on Code Generation with Varying Levels of Prompt Specificity

Large language models (LLMs) have demonstrated unparalleled prowess in mimicking human-like text generation and processing. Among the myriad of applications that benefit from LLMs, automated code generation is increasingly promising. The…

Software Engineering · Computer Science 2023-11-15 Lincoln Murr , Morgan Grainger , David Gao

Analyzing the Performance of GPT-3.5 and GPT-4 in Grammatical Error Correction

GPT-3 and GPT-4 models are powerful, achieving high performance on a variety of Natural Language Processing tasks. However, there is a relative lack of detailed published analysis of their performance on the task of grammatical error…

Computation and Language · Computer Science 2023-05-31 Steven Coyne , Keisuke Sakaguchi , Diana Galvan-Sosa , Michael Zock , Kentaro Inui

Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies

Large Language Models (LLMs) have revolutionized the field of Natural Language Processing thanks to their ability to reuse knowledge acquired on massive text corpora on a wide variety of downstream tasks, with minimal (if any) tuning steps.…

Computation and Language · Computer Science 2024-07-12 Flavio Petruzzellis , Alberto Testolin , Alessandro Sperduti

Enhancing Computer Programming Education with LLMs: A Study on Effective Prompt Engineering for Python Code Generation

Large language models (LLMs) and prompt engineering hold significant potential for advancing computer programming education through personalized instruction. This paper explores this potential by investigating three critical research…

Artificial Intelligence · Computer Science 2024-07-09 Tianyu Wang , Nianjun Zhou , Zhixiong Chen

Advancing GenAI Assisted Programming--A Comparative Study on Prompt Efficiency and Code Quality Between GPT-4 and GLM-4

This study aims to explore the best practices for utilizing GenAI as a programming tool, through a comparative analysis between GPT-4 and GLM-4. By evaluating prompting strategies at different levels of complexity, we identify that simplest…

Software Engineering · Computer Science 2024-02-21 Angus Yang , Zehan Li , Jie Li

Assessing the Promise and Pitfalls of ChatGPT for Automated Code Generation

This paper presents a comprehensive evaluation of the code generation capabilities of ChatGPT, a prominent large language model, compared to human programmers. A novel dataset of 131 code-generation prompts across 5 categories was curated…

Software Engineering · Computer Science 2023-11-07 Muhammad Fawad Akbar Khan , Max Ramsdell , Erik Falor , Hamid Karimi

Analyzing Prominent LLMs: An Empirical Study of Performance and Complexity in Solving LeetCode Problems

Large Language Models (LLMs) like ChatGPT, Copilot, Gemini, and DeepSeek are transforming software engineering by automating key tasks, including code generation, testing, and debugging. As these models become integral to development…

Software Engineering · Computer Science 2025-08-07 Everton Guimaraes , Nathalia Nascimento , Chandan Shivalingaiah , Asish Nelapati

Evaluating the Energy-Efficiency of the Code Generated by LLMs

As the quality of code generated by Large Language Models (LLMs) improves, their adoption in the software industry for automated code generation continues to grow. Researchers primarily focus on enhancing the functional correctness of the…

Software Engineering · Computer Science 2025-05-28 Md Arman Islam , Devi Varaprasad Jonnala , Ritika Rekhi , Pratik Pokharel , Siddharth Cilamkoti , Asif Imran , Tevfik Kosar , Bekir Turkkan

Evaluating ChatGPT-3.5 Efficiency in Solving Coding Problems of Different Complexity Levels: An Empirical Analysis

ChatGPT and other large language models (LLMs) promise to revolutionize software development by automatically generating code from program specifications. We assess the performance of ChatGPT's GPT-3.5-turbo model on LeetCode, a popular…

Software Engineering · Computer Science 2024-11-13 Minda Li , Bhaskar Krishnamachari

A Comparison of Human and ChatGPT Classification Performance on Complex Social Media Data

Generative artificial intelligence tools, like ChatGPT, are an increasingly utilized resource among computational social scientists. Nevertheless, there remains space for improved understanding of the performance of ChatGPT in complex tasks…

Computation and Language · Computer Science 2025-12-02 Breanna E. Green , Ashley L. Shea , Pengfei Zhao , Drew B. Margolin

Thrilled by Your Progress! Large Language Models (GPT-4) No Longer Struggle to Pass Assessments in Higher Education Programming Courses

This paper studies recent developments in large language models' (LLM) abilities to pass assessments in introductory and intermediate Python programming courses at the postsecondary level. The emergence of ChatGPT resulted in heated debates…

Computers and Society · Computer Science 2023-10-05 Jaromir Savelka , Arav Agarwal , Marshall An , Chris Bogart , Majd Sakr