English
Related papers

Related papers: Machine Learning Based Source Code Classification …

200 papers

Determining the programming language of a source code file has been considered in the research community; it has been shown that Machine Learning (ML) and Natural Language Processing (NLP) algorithms can be effective in identifying the…

Software Engineering · Computer Science 2018-09-24 Kamel Alreshedy , Dhanush Dharmaretnam , Daniel M. German , Venkatesh Srinivasan , T. Aaron Gulliver

Motivated by the amount of code that goes unidentified on the web, we introduce a practical method for algorithmically identifying the programming language of source code. Our work is based on supervised learning and intelligent statistical…

Machine Learning · Computer Science 2011-11-10 David Klein , Kyle Murray , Simon Weber

In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as developing new features, code reviews, and bug…

Software Engineering · Computer Science 2022-08-17 Otávio Cury , Guilherme Avelino , Pedro Santos Neto , Ricardo Britto , Marco Túlio Valente

Programming language detection is a common need in the analysis of large source code bases. It is supported by a number of existing tools that rely on several features, and most notably file extensions, to determine file types. We consider…

Software Engineering · Computer Science 2021-03-02 Francesca Del Bonifro , Maurizio Gabbrielli , Stefano Zacchiroli

In today's software world with its cornucopia of reusable software libraries, when a programmer is faced with a programming task that they suspect can be completed through the use of a library, they often look for code examples using a…

Software Engineering · Computer Science 2021-10-08 Geert Heyman , Rafael Huysegems , Pascal Justen , Tom Van Cutsem

The automatic generation of source code is one of the long-lasting dreams in software engineering research. Several techniques have been proposed to speed up the writing of new code. For example, code completion techniques can recommend to…

Software Engineering · Computer Science 2023-02-09 Matteo Ciniselli , Luca Pascarella , Emad Aghajani , Simone Scalabrino , Rocco Oliveto , Gabriele Bavota

The advancements in machine learning techniques have encouraged researchers to apply these techniques to a myriad of software engineering tasks that use source code analysis, such as testing and vulnerability detection. Such a large number…

Software Engineering · Computer Science 2022-09-14 Tushar Sharma , Maria Kechagia , Stefanos Georgiou , Rohit Tiwari , Indira Vats , Hadi Moazen , Federica Sarro

Many software analysis methods have come to rely on machine learning approaches. Code segmentation - the process of decomposing source code into meaningful blocks - can augment these methods by featurizing code, reducing noise, and limiting…

Software Engineering · Computer Science 2019-07-23 Jacob Dormuth , Ben Gelman , Jessica Moore , David Slater

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Data profiling is critical in machine learning for generating descriptive statistics, supporting both deeper understanding and downstream tasks like data valuation and curation. This work addresses profiling specifically in the context of…

Software Engineering · Computer Science 2025-03-21 Pankaj Thorat , Adnan Qidwai , Adrija Dhar , Aishwariya Chakraborty , Anand Eswaran , Hima Patel , Praveen Jayachandran

Artificial Intelligence (AI) techniques, especially Large Language Models (LLMs), have started gaining popularity among researchers and software developers for generating source code. However, LLMs have been shown to generate code with…

Software Engineering · Computer Science 2024-11-08 Hyunjae Suh , Mahan Tafreshipour , Jiawei Li , Adithya Bhattiprolu , Iftekhar Ahmed

In software engineering-related tasks (such as programming language tag prediction based on code snippets from Stack Overflow), the programming language classification for code snippets is a common task. In this study, we propose a novel…

Software Engineering · Computer Science 2021-10-05 Guang Yang , Yanlin Zhou , Chi Yu , Xiang Chen

The Software Naturalness hypothesis argues that programming languages can be understood through the same techniques used in natural language processing. We explore this hypothesis through the use of a pre-trained transformer-based language…

Change-prone classes or modules are defined as software components in the source code which are likely to change in the future. Change-proneness prediction is useful to the maintenance team as they can optimize and focus their testing…

Software Engineering · Computer Science 2017-12-22 Lov Kumar , Ashish Sureka

Recent research has repeatedly shown that machine learning techniques can be applied to either whole files or file fragments to classify them for analysis. We build upon these techniques to show that for samples of un-labeled compiled…

Machine Learning · Statistics 2018-05-08 John Clemens

Researchers have investigated the potential of leveraging pre-trained language models, such as CodeBERT, to enhance source code-related tasks. Previous methodologies have relied on CodeBERT's '[CLS]' token as the embedding representation of…

Computation and Language · Computer Science 2024-09-04 Yong Ma , Senlin Luo , Yu-Ming Shang , Yifei Zhang , Zhengjun Li

Most of open-source software systems become available on the internet today. Thus, we need automatic methods to label software code. Software code can be labeled with a set of keywords. These keywords in this paper referred as software…

Software Engineering · Computer Science 2018-03-02 Ra'Fat Al-Msie'deen

A range of applications for automatic machine learning need the generation process to be controllable. In this work, we propose a way to control the output via a sequence of simple actions, that are called semantic code classes. Finally, we…

As an integral part of source code files, code comments help improve program readability and comprehension. However, developers sometimes do not comment on their program code adequately due to the incurred extra efforts, lack of relevant…

Software Engineering · Computer Science 2019-07-31 Xiaotao Song , Hailong Sun , Xu Wang , Jiafei Yan

In model-driven development (MDD) software emerges by systematically transforming abstract models to concrete source code. Ideally, performing those transformations is to a large extent the task of code generators. One approach for…

Software Engineering · Computer Science 2015-09-09 Pedram Mir Seyed Nazari , Bernhard Rumpe
‹ Prev 1 2 3 10 Next ›