Related papers: COMEX: A Tool for Generating Customized Source Cod…

CodeSAM: Source Code Representation Learning by Infusing Self-Attention with Multi-Code-View Graphs

Machine Learning (ML) for software engineering (SE) has gained prominence due to its ability to significantly enhance the performance of various SE applications. This progress is largely attributed to the development of generalizable source…

Software Engineering · Computer Science 2024-11-25 Alex Mathai , Kranthi Sedamaki , Debeshee Das , Noble Saji Mathews , Srikanth Tamilselvam , Sridhar Chimalakonda , Atul Kumar

Learning to Reason via Program Generation, Emulation, and Search

Program synthesis with language models (LMs) has unlocked a large set of reasoning abilities; code-tuned LMs have proven adept at generating programs that solve a wide variety of algorithmic symbolic manipulation tasks (e.g. word…

Computation and Language · Computer Science 2024-11-05 Nathaniel Weir , Muhammad Khalifa , Linlu Qiu , Orion Weller , Peter Clark

CodeLens: An Interactive Tool for Visualizing Code Representations

Representing source code in a generic input format is crucial to automate software engineering tasks, e.g., applying machine learning algorithms to extract information. Visualizing code representations can further enable human experts to…

Software Engineering · Computer Science 2023-07-28 Yuejun Guo , Seifeddine Bettaieb , Qiang Hu , Yves Le Traon , Qiang Tang

CodeScore: Evaluating Code Generation by Learning Code Execution

A proper code evaluation metric (CEM) profoundly impacts the evolution of code generation, which is an important research field in NLP and software engineering. Prevailing match-based CEMs (e.g., BLEU, Accuracy, and CodeBLEU) suffer from…

Software Engineering · Computer Science 2024-09-06 Yihong Dong , Jiazheng Ding , Xue Jiang , Ge Li , Zhuo Li , Zhi Jin

Automated Code Review Using Large Language Models with Symbolic Reasoning

Code review is one of the key processes in the software development lifecycle and is essential to maintain code quality. However, manual code review is subjective and time consuming. Given its rule-based nature, code review is well suited…

Software Engineering · Computer Science 2025-07-25 Busra Icoz , Goksel Biricik

Large Language Models for Code Generation: The Practitioners Perspective

Large Language Models (LLMs) have emerged as coding assistants, capable of generating source code from natural language prompts. With the increasing adoption of LLMs in software development, academic research and industry based projects are…

Software Engineering · Computer Science 2025-01-29 Zeeshan Rasheed , Muhammad Waseem , Kai Kristian Kemell , Aakash Ahmad , Malik Abdul Sami , Jussi Rasku , Kari Systä , Pekka Abrahamsson

COSPEX: A Program Comprehension Tool for Novice Programmers

Developers often encounter unfamiliar code during software maintenance which consumes a significant amount of time for comprehension, especially for novice programmers. Automated techniques that analyze a source code and present key…

Software Engineering · Computer Science 2021-07-07 Ashutosh Rajput , Nakshatra Gupta , Sridhar Chimalakonda

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

LLMSecEval: A Dataset of Natural Language Prompts for Security Evaluations

Large Language Models (LLMs) like Codex are powerful tools for performing code completion and code generation tasks as they are trained on billions of lines of code from publicly available sources. Moreover, these models are capable of…

Software Engineering · Computer Science 2023-03-17 Catherine Tony , Markus Mutas , Nicolás E. Díaz Ferreyra , Riccardo Scandariato

CodeTree: Agent-guided Tree Search for Code Generation with Large Language Models

Pre-trained on massive amounts of code and text data, large language models (LLMs) have demonstrated remarkable achievements in performing code generation tasks. With additional execution-based feedback, these models can act as agents with…

Computation and Language · Computer Science 2024-11-14 Jierui Li , Hung Le , Yingbo Zhou , Caiming Xiong , Silvio Savarese , Doyen Sahoo

CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models

With the emergence of Large Language Models (LLMs), there has been a significant improvement in the programming capabilities of models, attracting growing attention from researchers. Evaluating the programming capabilities of LLMs is…

Computation and Language · Computer Science 2024-03-12 Lingyue Fu , Huacan Chai , Shuang Luo , Kounianhua Du , Weiming Zhang , Longteng Fan , Jiayi Lei , Renting Rui , Jianghao Lin , Yuchen Fang , Yifan Liu , Jingkuan Wang , Siyuan Qi , Kangning Zhang , Weinan Zhang , Yong Yu

CoRe: Benchmarking LLMs Code Reasoning Capabilities through Static Analysis Tasks

Large language models (LLMs) have been widely adopted across diverse domains of software engineering, such as code generation, program repair, and vulnerability detection. These applications require understanding beyond surface-level code…

Software Engineering · Computer Science 2026-01-21 Danning Xie , Mingwei Zheng , Xuwei Liu , Jiannan Wang , Chengpeng Wang , Lin Tan , Xiangyu Zhang

Codexity: Secure AI-assisted Code Generation

Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot,…

Software Engineering · Computer Science 2024-05-08 Sung Yong Kim , Zhiyu Fan , Yannic Noller , Abhik Roychoudhury

A Systematic Evaluation of Large Language Models of Code

Large language models (LMs) of code have recently shown tremendous promise in completing code and synthesizing code from natural language descriptions. However, the current state-of-the-art code LMs (e.g., Codex (Chen et al., 2021)) are not…

Programming Languages · Computer Science 2022-05-05 Frank F. Xu , Uri Alon , Graham Neubig , Vincent J. Hellendoorn

Frustrated with Code Quality Issues? LLMs can Help!

As software projects progress, quality of code assumes paramount importance as it affects reliability, maintainability and security of software. For this reason, static analysis tools are used in developer workflows to flag code quality…

Artificial Intelligence · Computer Science 2023-09-25 Nalin Wadhwa , Jui Pradhan , Atharv Sonwane , Surya Prakash Sahu , Nagarajan Natarajan , Aditya Kanade , Suresh Parthasarathy , Sriram Rajamani

Evaluating the Impact of Source Code Parsers on ML4SE Models

As researchers and practitioners apply Machine Learning to increasingly more software engineering problems, the approaches they use become more sophisticated. A lot of modern approaches utilize internal code structure in the form of an…

Software Engineering · Computer Science 2022-06-20 Ilya Utkin , Egor Spirin , Egor Bogomolov , Timofey Bryksin

CodeSteer: Symbolic-Augmented Language Models via Code/Text Guidance

Existing methods fail to effectively steer Large Language Models (LLMs) between textual reasoning and code generation, leaving symbolic computing capabilities underutilized. We introduce CodeSteer, an effective method for guiding LLM…

Computation and Language · Computer Science 2025-05-30 Yongchao Chen , Yilun Hao , Yueying Liu , Yang Zhang , Chuchu Fan

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

Large Language Models (LLMs) have achieved remarkable success in source code understanding, yet as software systems grow in scale, computational efficiency has become a critical bottleneck. Currently, these models rely on a text-based…

Computation and Language · Computer Science 2026-04-29 Yuling Shi , Chaoxiang Xie , Zhensu Sun , Yeheng Chen , Chenxu Zhang , Longfei Yun , Chengcheng Wan , Hongyu Zhang , David Lo , Xiaodong Gu

Exploring Code Analysis: Zero-Shot Insights on Syntax and Semantics with LLMs

Code analysis is fundamental in Software Engineering, supporting debugging, optimization, and security assessment. Human developers approach it through syntax parsing, static semantics inference, and dynamic reasoning. Traditional tools are…

Software Engineering · Computer Science 2026-05-22 Wei Ma , Zhihao Lin , Shangqing Liu , Qiang Hu , Ye Liu , Wenhan Wang , Cen Zhang , Liming Nie , Li Li , Yang Liu , Lingxiao Jiang

Exploring the Robustness of Large Language Models for Solving Programming Problems

Using large language models (LLMs) for source code has recently gained attention. LLMs, such as Transformer-based models like Codex and ChatGPT, have been shown to be highly capable of solving a wide range of programming problems. However,…

Computation and Language · Computer Science 2023-06-27 Atsushi Shirafuji , Yutaka Watanobe , Takumi Ito , Makoto Morishita , Yuki Nakamura , Yusuke Oda , Jun Suzuki