Related papers: Comparative Code Structure Analysis using Deep Lea…

A Comparison of Code Embeddings and Beyond

Program representation learning is a fundamental task in software engineering applications. With the availability of "big code" and the development of deep learning techniques, various program representation learning models have been…

Software Engineering · Computer Science 2021-09-17 Siqi Han , DongXia Wang , Wanting Li , Xuesong Lu

A deep tree-based model for software defect prediction

Defects are common in software systems and can potentially cause various problems to software users. Different methods have been developed to quickly predict the most likely locations of defects in large code bases. Most of them focus on…

Software Engineering · Computer Science 2018-02-06 Hoa Khanh Dam , Trang Pham , Shien Wee Ng , Truyen Tran , John Grundy , Aditya Ghose , Taeksu Kim , Chul-Joo Kim

Enhancing Source Code Representations for Deep Learning with Static Analysis

Deep learning techniques applied to program analysis tasks such as code classification, summarization, and bug detection have seen widespread interest. Traditional approaches, however, treat programming source code as natural language text,…

Software Engineering · Computer Science 2024-02-16 Xueting Guan , Christoph Treude

Abstract Syntax Tree for Programming Language Understanding and Representation: How Far Are We?

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks

Deep learning is being used extensively in a variety of software engineering tasks, e.g., program classification and defect prediction. Although the technique eliminates the required process of feature engineering, the construction of…

Software Engineering · Computer Science 2021-11-24 Zhehao Zhao , Bo Yang , Ge Li , Huai Liu , Zhi Jin

An AST-based Code Change Representation and its Performance in Just-in-time Vulnerability Prediction

The presence of software vulnerabilities is an ever-growing issue in software development. In most cases, it is desirable to detect vulnerabilities as early as possible, preferably in a just-in-time manner, when the vulnerable piece is…

Software Engineering · Computer Science 2023-03-30 Tamás Aladics , Péter Hegedűs , Rudolf Ferenc

A Unified Active Learning Framework for Annotating Graph Data with Application to Software Source Code Performance Prediction

Most machine learning and data analytics applications, including performance engineering in software systems, require a large number of annotations and labelled data, which might not be available in advance. Acquiring annotations often…

Software Engineering · Computer Science 2023-09-21 Peter Samoaa , Linus Aronsson , Antonio Longa , Philipp Leitner , Morteza Haghir Chehreghani

Structured Code Representations Enable Data-Efficient Adaptation of Code Language Models

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

Code Representation Learning with Pr\"ufer Sequences

An effective and efficient encoding of the source code of a computer program is critical to the success of sequence-to-sequence deep neural network models for tasks in computer program comprehension, such as automated code summarization and…

Artificial Intelligence · Computer Science 2021-11-16 Tenzin Jinpa , Yong Gao

Performance Embeddings: A Similarity-based Approach to Automatic Performance Optimization

Performance optimization is an increasingly challenging but often repetitive task. While each platform has its quirks, the underlying code transformations rely on data movement and computational characteristics that recur across…

Software Engineering · Computer Science 2023-03-16 Lukas Trümper , Tal Ben-Nun , Philipp Schaad , Alexandru Calotoiu , Torsten Hoefler

Playing Psychic: Using Thought Trees to Predict Reasoning Models Accuracy on Coding Tasks

Recent advances in large language models (LLMs) have shown that test-time scaling can substantially improve model performance on complex tasks, particularly in the coding domain. Under this paradigm, models use a larger token budget during…

Artificial Intelligence · Computer Science 2026-04-21 Jiaxin Fang , Runyuan He , Sahil Bhatia , Neel Gajare , Alvin Cheung

Learning to Represent Programs with Heterogeneous Graphs

Program source code contains complex structure information, which can be represented in structured data forms like trees or graphs. To acquire the structural information in source code, most existing researches use abstract syntax trees…

Software Engineering · Computer Science 2022-04-13 Kechi Zhang , Wenhan Wang , Huangzhao Zhang , Ge Li , Zhi Jin

Automatic feature learning for vulnerability prediction

Code flaws or vulnerabilities are prevalent in software systems and can potentially cause a variety of problems including deadlock, information loss, or system failure. A variety of approaches have been developed to try and detect the most…

Software Engineering · Computer Science 2017-08-09 Hoa Khanh Dam , Truyen Tran , Trang Pham , Shien Wee Ng , John Grundy , Aditya Ghose

Analysing the Behaviour of Tree-Based Neural Networks in Regression Tasks

The landscape of deep learning has vastly expanded the frontiers of source code analysis, particularly through the utilization of structural representations such as Abstract Syntax Trees (ASTs). While these methodologies have demonstrated…

Machine Learning · Computer Science 2024-06-18 Peter Samoaa , Mehrdad Farahani , Antonio Longa , Philipp Leitner , Morteza Haghir Chehreghani

Adding Context to Source Code Representations for Deep Learning

Deep learning models have been successfully applied to a variety of software engineering tasks, such as code classification, summarisation, and bug and vulnerability detection. In order to apply deep learning to these tasks, source code…

Software Engineering · Computer Science 2022-08-02 Fuwei Tian , Christoph Treude

Unified Abstract Syntax Tree Representation Learning for Cross-Language Program Classification

Program classification can be regarded as a high-level abstraction of code, laying a foundation for various tasks related to source code comprehension, and has a very wide range of applications in the field of software engineering, such as…

Software Engineering · Computer Science 2022-05-03 Kesu Wang , Meng Yan , He Zhang , Haibo Hu

Predicting Vulnerability In Large Codebases With Deep Code Representation

Currently, while software engineers write code for various modules, quite often, various types of errors - coding, logic, semantic, and others (most of which are not caught by compilation and other tools) get introduced. Some of these bugs…

Software Engineering · Computer Science 2020-04-28 Anshul Tanwar , Krishna Sundaresan , Parmesh Ashwath , Prasanna Ganesan , Sathish Kumar Chandrasekaran , Sriram Ravi

Interpretable Structure-aware Document Encoders with Hierarchical Attention

We propose a method to create document representations that reflect their internal structure. We modify Tree-LSTMs to hierarchically merge basic elements such as words and sentences into blocks of increasing complexity. Our Structure…

Computation and Language · Computer Science 2019-10-08 Khalil Mrini , Claudiu Musat , Michael Baeriswyl , Martin Jaggi

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

Improving the Robustness to Data Inconsistency between Training and Testing for Code Completion by Hierarchical Language Model

In the field of software engineering, applying language models to the token sequence of source code is the state-of-art approach to build a code recommendation system. The syntax tree of source code has hierarchical structures. Ignoring the…

Software Engineering · Computer Science 2022-11-29 Yixiao Yang