Related papers: SCC: Automatic Classification of Code Snippets
In software engineering-related tasks (such as programming language tag prediction based on code snippets from Stack Overflow), the programming language classification for code snippets is a common task. In this study, we propose a novel…
Stack Overflow is the most popular Q&A website among software developers. As a platform for knowledge sharing and acquisition, the questions posted in Stack Overflow usually contain a code snippet. Stack Overflow relies on users to properly…
As of today the programming language of the vast majority of the published source code is manually specified or programmatically assigned based on the sole file extension. In this paper we show that the source code programming language…
Code search and comprehension have become more difficult in recent years due to the rapid expansion of available source code. Current tools lack a way to label arbitrary code at scale while maintaining up-to-date representations of new…
Two key contributions presented in this paper are: i) A method for building a dataset containing source code features extracted from source files taken from Open Source Software (OSS) and associated bug reports, ii) A predictive model for…
In software development, the identification of source code file experts is an important task. Identifying these experts helps to improve software maintenance and evolution activities, such as developing new features, code reviews, and bug…
Code comments are vital to source code as they help developers with program comprehension tasks. Written in natural language (usually English), code comments convey a variety of different information, which are grouped into specific…
An important goal for programmers is to minimize cost of identifying and correcting defects in source code. Code review is commonly used for identifying programming defects. However, manual code review has some shortcomings: a) it is time…
Enriched by natural language texts, Stack Overflow code snippets are an invaluable code-centric knowledge base of small units of source code. Besides being useful for software developers, these annotated snippets can potentially serve as…
Programming language detection is a common need in the analysis of large source code bases. It is supported by a number of existing tools that rely on several features, and most notably file extensions, to determine file types. We consider…
This paper illustrates an empirical study of the working efficiency of machine learning techniques in classifying code review text by semantic meaning. The code review comments from the source control repository in GitHub were extracted for…
Language Models (LLMs), such as transformer-based neural networks trained on billions of parameters, have become increasingly prevalent in software engineering (SE). These models, trained on extensive datasets that include code…
As software systems grow in scale and complexity, understanding the distribution of programming language topics within source code becomes increasingly important for guiding technical decisions, improving onboarding, and informing tooling…
Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell…
Multilingual programming, which involves using multiple programming languages (PLs) in a single project, is increasingly common due to its benefits. However, it introduces cross-language bugs (CLBs), which arise from interactions between…
We are trying to find source code comments that help programmers understand a nontrivial part of source code. One of such examples would be explaining to assign a zero as a way to "clear" a buffer. Such comments are invaluable to…
Finding the same or similar code snippets in source code is one of fundamental activities in software maintenance. Text-based pattern matching tools such as grep is frequently used for such purpose, but making proper queries for the…
Coding is a fundamental skill required in the engineering discipline, and much work exists exploring better ways of teaching coding in the higher education context. In particular, Code Snippets (CSs) are approved to be an effective way of…
Software clones are beneficial to detect security gaps and software maintenance in one programming language or across multiple languages. The existing work on source clone detection performs well but in a single programming language.…
In recent years, code security has become increasingly important, especially with the rise of interconnected technologies. Detecting vulnerabilities early in the software development process has demonstrated numerous benefits. Consequently,…