Related papers: DeepCodeProbe: Towards Understanding What Models T…

Towards Understanding What Code Language Models Learned

Pre-trained language models are effective in a variety of natural language tasks, but it has been argued their capabilities fall short of fully learning meaning or understanding language. To understand the extent to which language models…

Software Engineering · Computer Science 2024-02-29 Toufique Ahmed , Dian Yu , Chengxuan Huang , Cathy Wang , Prem Devanbu , Kenji Sagae

Probing Pretrained Models of Source Code

Deep learning models are widely used for solving challenging code processing tasks, such as code generation or code summarization. Traditionally, a specific model architecture was carefully built to solve a particular code processing task.…

Software Engineering · Computer Science 2022-11-18 Sergey Troshin , Nadezhda Chirkova

A Survey of Deep Learning Models for Structural Code Understanding

In recent years, the rise of deep learning and automation requirements in the software industry has elevated Intelligent Software Engineering to new heights. The number of approaches and applications in code understanding is growing, with…

Software Engineering · Computer Science 2022-05-04 Ruoting Wu , Yuxin Zhang , Qibiao Peng , Liang Chen , Zibin Zheng

DeepClone: Modeling Clones to Generate Code Predictions

Programmers often reuse code from source code repositories to reduce the development effort. Code clones are candidates for reuse in exploratory or rapid development, as they represent often repeated functionality in software systems. To…

Software Engineering · Computer Science 2020-12-08 Muhammad Hammad , Önder Babur , Hamid Abdul Basit , Mark van den Brand

Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges

Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in…

Software Engineering · Computer Science 2020-06-16 Triet H. M. Le , Hao Chen , M. Ali Babar

Towards Understanding the Capability of Large Language Models on Code Clone Detection: A Survey

Code cloning, the duplication of code fragments, is common in software development. While some reuse aids productivity, excessive cloning hurts maintainability and introduces bugs. Hence, automatic code clone detection is vital. Meanwhile,…

Software Engineering · Computer Science 2023-08-08 Shihan Dou , Junjie Shan , Haoxiang Jia , Wenhao Deng , Zhiheng Xi , Wei He , Yueming Wu , Tao Gui , Yang Liu , Xuanjing Huang

Deep Learning Meets Software Engineering: A Survey on Pre-Trained Models of Source Code

Recent years have seen the successful application of deep learning to software engineering (SE). In particular, the development and use of pre-trained models of source code has enabled state-of-the-art results to be achieved on a wide…

Software Engineering · Computer Science 2022-05-25 Changan Niu , Chuanyi Li , Bin Luo , Vincent Ng

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

Investigating the Impact of SOLID Design Principles on Machine Learning Code Understanding

[Context] Applying design principles has long been acknowledged as beneficial for understanding and maintainability in traditional software projects. These benefits may similarly hold for Machine Learning (ML) projects, which involve…

Software Engineering · Computer Science 2024-02-09 Raphael Cabral , Marcos Kalinowski , Maria Teresa Baldassarre , Hugo Villamizar , Tatiana Escovedo , Hélio Lopes

A Code Comprehension Benchmark for Large Language Models for Code

Large Language Models have shown impressive capabilities in coding tasks like code generation and code completion, as they have been trained on a large amount of code data. Also, since one of the core pretraining objectives is Next Token…

Software Engineering · Computer Science 2025-07-16 Jayant Havare , Saurav Chaudhary , Ganesh Ramakrishnan , Kaushik Maharajan , Srikanth Tamilselvam

Unveiling Code Pre-Trained Models: Investigating Syntax and Semantics Capacities

Past research has examined how well these models grasp code syntax, yet their understanding of code semantics still needs to be explored. We extensively analyze seven code models to investigate how code models represent code syntax and…

Software Engineering · Computer Science 2024-04-18 Wei Ma , Shangqing Liu , Mengjie Zhao , Xiaofei Xie , Wenhan Wang , Qiang Hu , Jie Zhang , Yang Liu

Metamorphic Testing of Deep Code Models: A Systematic Literature Review

Large language models and deep learning models designed for code intelligence have revolutionized the software engineering field due to their ability to perform various code-related tasks. These models can process source code and software…

Software Engineering · Computer Science 2025-07-31 Ali Asgari , Milan de Koning , Pouria Derakhshanfar , Annibale Panichella

An Empirical Study of Deep Learning Models for Vulnerability Detection

Deep learning (DL) models of code have recently reported great progress for vulnerability detection. In some cases, DL-based models have outperformed static analysis tools. Although many great models have been proposed, we do not yet have a…

Software Engineering · Computer Science 2023-02-14 Benjamin Steenhoek , Md Mahbubur Rahman , Richard Jiles , Wei Le

A Critical Study of What Code-LLMs (Do Not) Learn

Large Language Models trained on code corpora (code-LLMs) have demonstrated impressive performance in various coding assistance tasks. However, despite their increased size and training dataset, code-LLMs still have limitations such as…

Software Engineering · Computer Science 2024-06-19 Abhinav Anand , Shweta Verma , Krishna Narasimhan , Mira Mezini

Deep Learning for Code Intelligence: Survey, Benchmark and Toolkit

Code intelligence leverages machine learning techniques to extract knowledge from extensive code corpora, with the aim of developing intelligent tools to improve the quality and productivity of computer programming. Currently, there is…

Software Engineering · Computer Science 2024-01-02 Yao Wan , Yang He , Zhangqian Bi , Jianguo Zhang , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin , Philip S. Yu

Should Code Models Learn Pedagogically? A Preliminary Evaluation of Curriculum Learning for Real-World Software Engineering Tasks

Learning-based techniques, especially advanced pre-trained models for code have demonstrated capabilities in code understanding and generation, solving diverse software engineering (SE) tasks. Despite the promising results, current training…

Software Engineering · Computer Science 2025-02-07 Kyi Shin Khant , Hong Yi Lin , Patanamon Thongtanunam

Code Representation Learning At Scale

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred…

Computation and Language · Computer Science 2024-02-06 Dejiao Zhang , Wasi Ahmad , Ming Tan , Hantian Ding , Ramesh Nallapati , Dan Roth , Xiaofei Ma , Bing Xiang

Beyond Decodability: Reconstructing Language Model Representations with an Encoding Probe

Probing is widely used to study which features can be decoded from language model representations. However, the common decoding probe approach has two limitations that we aim to solve with our new encoding probe approach: contributions of…

Computation and Language · Computer Science 2026-05-04 Gaofei Shen , Martijn Bentum , Tom Lentz , Afra Alishahi , Grzegorz Chrupała

Model-Agnostic Correctness Assessment for LLM-Generated Code via Dynamic Internal Representation Selection

Large Language Models (LLMs) have demonstrated impressive capabilities in code generation and are increasingly integrated into the software development process. However, ensuring the correctness of LLM-generated code remains a critical…

Software Engineering · Computer Science 2025-10-06 Thanh Trong Vu , Tuan-Dung Bui , Thu-Trang Nguyen , Son Nguyen , Hieu Dinh Vo