Related papers: TRACED: Execution-aware Pre-training for Source Co…

What Do They Capture? -- A Structural Analysis of Pre-Trained Language Models for Source Code

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

Multi-task Learning based Pre-trained Language Model for Code Completion

Code completion is one of the most useful features in the Integrated Development Environments (IDEs), which can accelerate software development by suggesting the next probable token based on the contextual code in real-time. Recent studies…

Software Engineering · Computer Science 2021-01-01 Fang Liu , Ge Li , Yunfei Zhao , Zhi Jin

Trex: Learning Execution Semantics from Micro-Traces for Binary Similarity

Detecting semantically similar functions -- a crucial analysis capability with broad real-world security usages including vulnerability detection, malware lineage, and forensics -- requires understanding function behaviors and intentions.…

Cryptography and Security · Computer Science 2021-04-28 Kexin Pei , Zhou Xuan , Junfeng Yang , Suman Jana , Baishakhi Ray

What I cannot execute, I do not understand: Training and Evaluating LLMs on Program Execution Traces

Code generation and understanding are critical capabilities for large language models (LLMs). Thus, most LLMs are pretrained and fine-tuned on code data. However, these datasets typically treat code as static strings and rarely exploit the…

Machine Learning · Computer Science 2025-03-11 Jordi Armengol-Estapé , Quentin Carbonneaux , Tianjun Zhang , Aram H. Markosyan , Volker Seeker , Chris Cummins , Melanie Kambadur , Michael F. P. O'Boyle , Sida Wang , Gabriel Synnaeve , Hugh James Leather

An Empirical Comparison of Pre-Trained Models of Source Code

While a large number of pre-trained models of source code have been successfully developed and applied to a variety of software engineering (SE) tasks in recent years, our understanding of these pre-trained models is arguably fairly…

Software Engineering · Computer Science 2023-02-09 Changan Niu , Chuanyi Li , Vincent Ng , Dongxiao Chen , Jidong Ge , Bin Luo

Code Execution with Pre-trained Language Models

Code execution is a fundamental aspect of programming language semantics that reflects the exact behavior of the code. However, most pre-trained models for code intelligence ignore the execution trace and only rely on source code and…

Programming Languages · Computer Science 2023-05-10 Chenxiao Liu , Shuai Lu , Weizhu Chen , Daxin Jiang , Alexey Svyatkovskiy , Shengyu Fu , Neel Sundaresan , Nan Duan

What do pre-trained code models know about code?

Pre-trained models of code built on the transformer architecture have performed well on software engineering (SE) tasks such as predictive code generation, code summarization, among others. However, whether the vector representations from…

Software Engineering · Computer Science 2021-08-26 Anjan Karmakar , Romain Robbes

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

Benchmarking Language Models for Code Syntax Understanding

Pre-trained language models have demonstrated impressive performance in both natural language processing and program understanding, which represent the input as a token sequence without explicitly modeling its structure. Some prior works…

Computation and Language · Computer Science 2022-10-27 Da Shen , Xinyun Chen , Chenguang Wang , Koushik Sen , Dawn Song

Learning to Extend Program Graphs to Work-in-Progress Code

Source code spends most of its time in a broken or incomplete state during software development. This presents a challenge to machine learning for code, since high-performing models typically rely on graph structured representations of…

Machine Learning · Computer Science 2021-06-01 Xuechen Li , Chris J. Maddison , Daniel Tarlow

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

GraphCodeBERT: Pre-training Code Representations with Data Flow

Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code…

Software Engineering · Computer Science 2021-09-14 Daya Guo , Shuo Ren , Shuai Lu , Zhangyin Feng , Duyu Tang , Shujie Liu , Long Zhou , Nan Duan , Alexey Svyatkovskiy , Shengyu Fu , Michele Tufano , Shao Kun Deng , Colin Clement , Dawn Drain , Neel Sundaresan , Jian Yin , Daxin Jiang , Ming Zhou

SPT-Code: Sequence-to-Sequence Pre-Training for Learning Source Code Representations

Recent years have seen the successful application of large pre-trained models to code representation learning, resulting in substantial improvements on many code-related downstream tasks. But there are issues surrounding their application…

Software Engineering · Computer Science 2022-05-26 Changan Niu , Chuanyi Li , Vincent Ng , Jidong Ge , Liguo Huang , Bin Luo

TRACED: Transition-aware Regret Approximation with Co-learnability for Environment Design

Generalizing deep reinforcement learning agents to unseen environments remains a significant challenge. One promising solution is Unsupervised Environment Design (UED), a co-evolutionary framework in which a teacher adaptively generates…

Machine Learning · Computer Science 2026-03-17 Geonwoo Cho , Jaegyun Im , Jihwan Lee , Hojun Yi , Sejin Kim , Sundong Kim

On-the-Fly Adaptation of Source Code Models using Meta-Learning

The ability to adapt to unseen, local contexts is an important challenge that successful models of source code must overcome. One of the most popular approaches for the adaptation of such models is dynamic evaluation. With dynamic…

Machine Learning · Computer Science 2023-06-21 Disha Shrivastava , Hugo Larochelle , Daniel Tarlow

TreeCaps: Tree-Structured Capsule Networks for Program Source Code Processing

Program comprehension is a fundamental task in software development and maintenance processes. Software developers often need to understand a large amount of existing code before they can develop new features or fix bugs in existing…

Machine Learning · Computer Science 2019-10-29 Vinoj Jayasundara , Nghi Duy Quoc Bui , Lingxiao Jiang , David Lo

SCALE: Constructing Structured Natural Language Comment Trees for Software Vulnerability Detection

Recently, there has been a growing interest in automatic software vulnerability detection. Pre-trained model-based approaches have demonstrated superior performance than other Deep Learning (DL)-based approaches in detecting…

Software Engineering · Computer Science 2024-03-29 Xin-Cheng Wen , Cuiyun Gao , Shuzheng Gao , Yang Xiao , Michael R. Lyu

Investigating Execution-Aware Language Models for Code Optimization

Code optimization is the process of enhancing code efficiency, while preserving its intended functionality. This process often requires a deep understanding of the code execution behavior at run-time to identify and address inefficiencies…

Software Engineering · Computer Science 2026-04-02 Federico Di Menna , Luca Traini , Gabriele Bavota , Vittorio Cortellessa

INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers

Pre-trained models of source code have recently been successfully applied to a wide variety of Software Engineering tasks; they have also seen some practical adoption in practice, e.g. for code completion. Yet, we still know very little…

Software Engineering · Computer Science 2023-12-11 Anjan Karmakar , Romain Robbes

DFEPT: Data Flow Embedding for Enhancing Pre-Trained Model Based Vulnerability Detection

Software vulnerabilities represent one of the most pressing threats to computing systems. Identifying vulnerabilities in source code is crucial for protecting user privacy and reducing economic losses. Traditional static analysis tools rely…

Software Engineering · Computer Science 2024-10-25 Zhonghao Jiang , Weifeng Sun , Xiaoyan Gu , Jiaxin Wu , Tao Wen , Haibo Hu , Meng Yan