English
Related papers

Related papers: SpecEval: Evaluating Code Comprehension in Large L…

200 papers

Recent advancements in large language models (LLMs) have automated various software engineering tasks, with benchmarks emerging to evaluate their capabilities. However, for adaptation, a critical activity during code reuse, there is no…

Software Engineering · Computer Science 2026-01-09 Tanghaoran Zhang , Xinjun Mao , Shangwen Wang , Yuxin Zhao , Yao Lu , Jin Zhang , Zhang Zhang , Kang Yang , Yue Yu

Code large language models (LLMs) have shown remarkable advances in code understanding, completion, and generation tasks. Programming benchmarks, comprised of a selection of code challenges and corresponding test cases, serve as a standard…

Large Language Models (LLMs) are predominantly assessed based on their common sense reasoning, language comprehension, and logical reasoning abilities. While models trained in specialized domains like mathematics or coding have demonstrated…

Software Engineering · Computer Science 2026-01-08 Danny Brahman , Mohammad Mahoor

Large language models for code (i.e., code LLMs) have shown strong code understanding and generation capabilities. To evaluate the capabilities of code LLMs in various aspects, many benchmarks have been proposed (e.g., HumanEval and…

Software Engineering · Computer Science 2024-09-24 Junkai Chen , Zhiyuan Pan , Xing Hu , Zhenhao Li , Ge Li , Xin Xia

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, including software development, education, and technical assistance. Among these, software development is one of the key areas where LLMs are…

Computation and Language · Computer Science 2026-01-07 Inpyo Song , Eunji Jeon , Jangwon Lee

Recently, large language models (LLMs), especially those that are pretrained on code, have demonstrated strong capabilities in generating programs from natural language inputs in a few-shot or even zero-shot manner. Despite promising…

Log analysis is crucial for ensuring the orderly and stable operation of information systems, particularly in the field of Artificial Intelligence for IT Operations (AIOps). Large Language Models (LLMs) have demonstrated significant…

Computation and Language · Computer Science 2024-07-03 Tianyu Cui , Shiyu Ma , Ziang Chen , Tong Xiao , Shimin Tao , Yilun Liu , Shenglin Zhang , Duoming Lin , Changchang Liu , Yuzhe Cai , Weibin Meng , Yongqian Sun , Dan Pei

Recent advancements in large language models (LLMs) have showcased impressive code generation capabilities, primarily evaluated through language-to-code benchmarks. However, these benchmarks may not fully capture a model's code…

Software Engineering · Computer Science 2024-09-16 Yuwei Zhao , Ziyang Luo , Yuchen Tian , Hongzhan Lin , Weixiang Yan , Annan Li , Jing Ma

In recent years, the application of large language models (LLMs) to code-related tasks has gained significant attention. However, existing evaluation benchmarks often focus on limited scenarios, such as code generation or completion, which…

Software Engineering · Computer Science 2024-09-17 Jia Feng , Jiachen Liu , Cuiyun Gao , Chun Yong Chong , Chaozheng Wang , Shan Gao , Xin Xia

Recent development of large language models (LLMs) for code like CodeX and CodeT5+ demonstrates tremendous promise in achieving code intelligence. Their ability of synthesizing code that completes a program for performing a pre-defined task…

Computation and Language · Computer Science 2023-10-10 Weimin Xiong , Yiwen Guo , Hao Chen

The remarkable reasoning and code generation capabilities of large language models (LLMs) have spurred significant interest in applying LLMs to enable task automation in digital chip design. In particular, recent work has investigated early…

Hardware Architecture · Computer Science 2024-11-01 Minwoo Kang , Mingjie Liu , Ghaith Bany Hamad , Syed Suhaib , Haoxing Ren

Large Language Models (LLMs) have significantly aided developers by generating or assisting in code writing, enhancing productivity across various tasks. While identifying incorrect code is often straightforward, detecting vulnerabilities…

Software Engineering · Computer Science 2025-01-15 Jinjun Peng , Leyi Cui , Kele Huang , Junfeng Yang , Baishakhi Ray

Code readability is crucial for software comprehension and maintenance, yet difficult to assess at scale. Traditional static metrics often fail to capture the subjective, context-sensitive nature of human judgments. Large Language Models…

Evaluating whether large language models (LLMs) can recover execution-relevant program structure, rather than only produce code that passes tests, remains an open problem. Existing code benchmarks emphasize test-passing outputs, from…

Software Engineering · Computer Science 2026-05-13 Yikun Li , Jinfeng Jiang , Ting Zhang , Chengran Yang , Chenxing Zhong , Yin Yide , Leow Wen Bin , Eng Lieh Ouh , Lwin Khin Shar , David Lo

Recent advancements in large language models (LLMs) have significantly enhanced their coding capabilities. However, existing benchmarks predominantly focused on simplified or isolated aspects of coding, such as single-file code generation…

Computation and Language · Computer Science 2024-12-17 Bowen Li , Wenhan Wu , Ziwei Tang , Lin Shi , John Yang , Jinyang Li , Shunyu Yao , Chen Qian , Binyuan Hui , Qicheng Zhang , Zhiyin Yu , He Du , Ping Yang , Dahua Lin , Chao Peng , Kai Chen

Code repair is a fundamental task in software development, facilitating efficient bug resolution and software maintenance. Although large language models (LLMs) have demonstrated considerable potential in automated code repair, their…

Software Engineering · Computer Science 2026-02-27 Dekun Dai , MingWei Liu , Anji Li , Jialun Cao , Yanlin Wang , Chong Wang , Xin Peng , Zibin Zheng

We present two comprehensive benchmarks to evaluate the performance of language models in coding assistance tasks, covering code writing, debugging, code review, and conceptual understanding. Our main contribution includes two curated…

Software Engineering · Computer Science 2024-12-10 Nidhish Shah , Zulkuf Genc , Dogu Araci

While large language models (LLMs) exhibit state-of-the-art performance in various tasks, recent studies have revealed their struggle for code translation. This is because they haven't been extensively pre-trained with parallel multilingual…

Software Engineering · Computer Science 2024-10-15 Qingxiao Tao , Tingrui Yu , Xiaodong Gu , Beijun Shen

Recently, pre-trained large language models (LLMs) have shown impressive abilities in generating codes from natural language descriptions, repairing buggy codes, translating codes between languages, and retrieving relevant code segments.…

Computation and Language · Computer Science 2023-11-07 Mohammad Abdullah Matin Khan , M Saiful Bari , Xuan Long Do , Weishi Wang , Md Rizwan Parvez , Shafiq Joty

Code coverage is a widely used metric for quantifying the extent to which program elements, such as statements or branches, are executed during testing. Calculating code coverage is resource-intensive, requiring code building and execution…

Software Engineering · Computer Science 2023-07-26 Michele Tufano , Shubham Chandel , Anisha Agarwal , Neel Sundaresan , Colin Clement
‹ Prev 1 2 3 10 Next ›