English
Related papers

Related papers: Interactive Duplicate Search in Software Documenta…

200 papers

Contemporary software documentation is as complicated as the software itself. During its lifecycle, the documentation accumulates a lot of near duplicate fragments, i.e. chunks of text that were copied from a single source and were later…

Software Engineering · Computer Science 2018-10-10 D. V. Luciv , D. V. Koznov , G. A. Chernishev , A. N. Terekhov

We describe a system that helps identify manuscripts submitted to multiple journals at the same time. Also, we discuss potential applications of the near-duplicate detection technology when run with manuscript text content, including…

Commercial web search engines employ near-duplicate detection to ensure that users see each relevant result only once, albeit the underlying web crawls typically include (near-)duplicates of many web pages. We revisit the risks and…

Nowadays, digital content is widespread and simply redistributable, either lawfully or unlawfully. For example, after images are posted on the internet, other web users can modify them and then repost their versions, thereby generating…

Computer Vision and Pattern Recognition · Computer Science 2020-09-08 K. K. Thyagharajan , G. Kalaiarasi

Code clone detection is involved with detecting duplicated fragments of code within a code base. Detecting these clones is useful for maintenance operations which require editing the clones. The tools developed are expected to be robust…

Software Engineering · Computer Science 2016-05-10 Ogechi Onuoha

Data deduplication is the task of detecting records in a database that correspond to the same real-world entity. Our goal is to develop a procedure that samples uniformly from the set of entities present in the database in the presence of…

Machine Learning · Computer Science 2020-08-25 Alireza Heidari , Shrinu Kushagra , Ihab F. Ilyas

Detecting near duplicate images is fundamental to the content ecosystem of photo sharing web applications. However, such a task is challenging when involving a web-scale image corpus containing billions of images. In this paper, we present…

Computer Vision and Pattern Recognition · Computer Science 2022-09-20 Andrey Gusev , Jiajing Xu

Job descriptions are posted on many online channels, including company websites, job boards or social media platforms. These descriptions are usually published with varying text for the same job, due to the requirements of each platform or…

Computation and Language · Computer Science 2024-06-11 Matthias Engelbach , Dennis Klau , Maximilien Kintz , Alexander Ulrich

Due to the increasing volume, volatility, and diversity of data in virtually all areas of our lives, the ability to detect duplicates in potentially linked data sources is more important than ever before. However, while research is already…

Databases · Computer Science 2024-01-01 Fabian Panse , Wolfram Wingerath , Benjamin Wollmer

All methodologies for detecting plagiarism to date have focused on the final digital "outcome", such as a document or source code. Our novel approach takes the creation process into account using logged events collected by special software…

Other Computer Science · Computer Science 2017-07-21 Johannes Schneider , Avi Bernstein , Jan Vom Brocke , Kostadin Damevski , David C. Shepherd

More than ever, technical inventions are the symbol of our society's advance. Patents guarantee their creators protection against infringement. For an invention being patentable, its novelty and inventiveness have to be assessed. Therefore,…

Information Retrieval · Computer Science 2019-03-06 Lea Helmers , Franziska Horn , Franziska Biegler , Tim Oppermann , Klaus-Robert Müller

One of the important factors that make a search engine fast and accurate is a concise and duplicate free index. In order to remove duplicate and near-duplicate documents from the index, a search engine needs a swift and reliable duplicate…

Information Retrieval · Computer Science 2019-09-26 Hamid Mohammadi , Seyed Hossein Khasteh

The importance of an efficient and scalable document similarity detection system is undeniable nowadays. Search engines need batch text similarity measures to detect duplicated and near-duplicated web pages in their indexes in order to…

Information Retrieval · Computer Science 2018-10-09 Hamid Mohammadi , Amin Nikoukaran

We present a new method to detect duplicates used to merge different bibliographic record corpora with the help of lexical and social information. As we show, a trivial key is not available to delete useless documents. Merging heteregeneous…

Databases · Computer Science 2015-04-29 Nicolas Turenne

Machine Learning software documentation is different from most of the documentations that were studied in software engineering research. Often, the users of these documentations are not software experts. The increasing interest in using…

Software Engineering · Computer Science 2020-02-03 Yalda Hashemi , Maleknaz Nayebi , Giuliano Antoniol

In the task of automatic program synthesis, one obtains pairs of matching inputs and outputs and generates a computer program, in a particular domain-specific language (DSL), which given each sample input returns the matching output. A key…

Machine Learning · Computer Science 2023-03-14 Aran Carmon , Lior Wolf

Automatic text summarisation has drawn considerable interest in the area of software engineering. It is challenging to summarise the activities related to a software project, (1) because of the volume and heterogeneity of involved software…

Software Engineering · Computer Science 2020-04-30 Mahfouth Alghamdi , Christoph Treude , Markus Wagner

Online learning platforms provide diverse questions to gauge the learners' understanding of different concepts. The repository of questions has to be constantly updated to ensure a diverse pool of questions to conduct assessments for…

Computation and Language · Computer Science 2023-01-13 Maksimjeet Chowdhary , Sanyam Goyal , Venktesh V , Mukesh Mohania , Vikram Goyal

There is a general belief that software must be able to easily do things that humans find difficult. Since finding sources for plagiarism in a text is not an easy task, there is a wide-spread expectation that it must be simple for software…

About 40% of software bug reports are duplicates of one another, which pose a major overhead during software maintenance. Traditional techniques often focus on detecting duplicate bug reports that are textually similar. However, in bug…

Software Engineering · Computer Science 2022-12-21 Sigma Jahan , Mohammad Masudur Rahman
‹ Prev 1 2 3 10 Next ›