Related papers: Code Similarity on High Level Programs
Source code similarity are increasingly used in application development to identify clones, isolate bugs, and find copy-rights violations. Similar code fragments can be very problematic due to the fact that errors in the original code must…
Assessing similarity in source code has gained significant attention in recent years due to its importance in software engineering tasks such as clone detection and code search and recommendation. This work presents a comparative analysis…
Finding the same or similar code snippets in source code is one of fundamental activities in software maintenance. Text-based pattern matching tools such as grep is frequently used for such purpose, but making proper queries for the…
Duplicated code has a negative impact on the quality of software systems and should be detected at least. In this paper, we discuss an approach that improves source code retrieval using the structural information about the programs. We…
Comparing time series is essential in various tasks such as clustering and classification. While elastic distance measures that allow warping provide a robust quantitative comparison, a qualitative comparison on top of them is missing.…
Binary code similarity detection is to detect the similarity of code at binary (assembly) level without source code. Existing works have their limitations when dealing with mutated binary code generated by different compiling options. In…
In recent years, defect prediction has received a great deal of attention in the empirical software engineering world. Predicting software defects before the maintenance phase is very important not only to decrease the maintenance costs but…
Research on time-series similarity measures has emphasized the need for elastic methods which align the indices of pairs of time series and a plethora of non-parametric have been proposed for the task. On the other hand, deep learning…
A time series consists of a series of values or events obtained over repeated measurements in time. Analysis of time series represents and important tool in many application areas, such as stock market analysis, process and quality control,…
We consider the problem of program clone search, i.e. given a target program and a repository of known programs (all in executable format), the goal is to find the program in the repository most similar to the target program - with…
Data coding as a building block of several image processing algorithms has been received great attention recently. Indeed, the importance of the locality assumption in coding approaches is studied in numerous works and several methods are…
A highly comparative, feature-based approach to time series classification is introduced that uses an extensive database of algorithms to extract thousands of interpretable features from time series. These features are derived from across…
To solve time inefficiency issue, only potential pairs are compared in string-matching-based source code plagiarism detection; wherein potentiality is defined through a fast-yet-order-insensitive similarity measurement (adapted from…
Translating source code from one programming language to another is a critical, time-consuming task in modernizing legacy applications and codebases. Recent work in this space has drawn inspiration from the software naturalness hypothesis…
We address the problem of constructing a fast lossless code in the case when the source alphabet is large. The main idea of the new scheme may be described as follows. We group letters with small probabilities in subsets (acting as super…
The capability of accurately determining code similarity is crucial in many tasks related to software development. For example, it might be essential to identify code duplicates for performing software maintenance. This research introduces…
Source code clone detection is the task of finding code fragments that have the same or similar functionality, but may differ in syntax or structure. This task is important for software maintenance, reuse, and quality assurance (Roy et al.…
Many software analysis methods have come to rely on machine learning approaches. Code segmentation - the process of decomposing source code into meaningful blocks - can augment these methods by featurizing code, reducing noise, and limiting…
Code clone detection is involved with detecting duplicated fragments of code within a code base. Detecting these clones is useful for maintenance operations which require editing the clones. The tools developed are expected to be robust…
This paper focuses on pattern matching in the DNA sequence. It was inspired by a previously reported method that proposes encoding both pattern and sequence using prime numbers. Although fast, the method is limited to rather small pattern…