Related papers: Newer method of string comparison: the Modified Mo…

Improving Scalability of Contrast Pattern Mining for Network Traffic Using Closed Patterns

Contrast pattern mining (CPM) aims to discover patterns whose support increases significantly from a background dataset compared to a target dataset. CPM is particularly useful for characterising changes in evolving systems, e.g., in…

Networking and Internet Architecture · Computer Science 2020-12-01 Elaheh AlipourChavary , Sarah M. Erfani , Christopher Leckie

Faster Algorithm of String Comparison

In many applications, it is necessary to determine the string similarity. Edit distance[WF74] approach is a classic method to determine Field Similarity. A well known dynamic programming algorithm [GUS97] is used to calculate edit distance…

Data Structures and Algorithms · Computer Science 2007-05-23 Qi Xiao Yang , Sung Sam Yuan , Lu Chun , Li Zhao , Sun Peng

Contrast Pattern Mining: A Survey

Contrast pattern mining (CPM) is an important and popular subfield of data mining. Traditional sequential patterns cannot describe the contrast information between different classes of data, while contrast patterns involving the concept of…

Databases · Computer Science 2022-09-28 Yao Chen , Wensheng Gan , Yongdong Wu , Philip S. Yu

Class Probability Matching Using Kernel Methods for Label Shift Adaptation

In domain adaptation, covariate shift and label shift problems are two distinct and complementary tasks. In covariate shift adaptation where the differences in data distribution arise from variations in feature probabilities, existing…

Machine Learning · Statistics 2023-12-13 Hongwei Wen , Annika Betken , Hanyuan Hang

Algorithms to compute the Burrows-Wheeler Similarity Distribution

The Burrows-Wheeler transform (BWT) is a well studied text transformation widely used in data compression and text indexing. The BWT of two strings can also provide similarity measures between them, based on the observation that the more…

Data Structures and Algorithms · Computer Science 2020-09-10 Felipe A. Louza , Guilherme P. Telles , Simon Gog , Liang Zhao

A Novel Algorithm for String Matching with Mismatches

We present an online algorithm to deal with pattern matching in strings. The problem we investigate is commonly known as string matching with mismatches in which the objective is to report the number of characters that match when a pattern…

Data Structures and Algorithms · Computer Science 2016-03-11 Vinodprasad P

In the realm of patent document analysis, assessing semantic similarity between phrases presents a significant challenge, notably amplifying the inherent complexities of Cooperative Patent Classification (CPC) research. Firstly, this study…

Computation and Language · Computer Science 2024-01-17 Liqiang Yu , Bo Liu , Qunwei Lin , Xinyu Zhao , Chang Che

A new approach for measuring semantic similarity of ontology concepts using dynamic programming

Today, with the emergence of semantic web technologies and increasing of information quantity, searching for information based on the semantic web has become a fertile area of research. For this reason, a large number of studies are…

Computer Vision and Pattern Recognition · Computer Science 2021-10-05 Noreddine Gherabi , Abdelhadi Daoui , Abderrahim Marzouk

String comparison by transposition networks

Computing string or sequence alignments is a classical method of comparing strings and has applications in many areas of computing, such as signal processing and bioinformatics. Semi-local string alignment is a recent generalisation of this…

Data Structures and Algorithms · Computer Science 2009-03-23 Peter Krusche , Alexander Tiskin

Text Categorization via Similarity Search: An Efficient and Effective Novel Algorithm

We present a supervised learning algorithm for text categorization which has brought the team of authors the 2nd place in the text categorization division of the 2012 Cybersecurity Data Mining Competition (CDMC'2012) and a 3rd prize…

Information Retrieval · Computer Science 2013-07-11 Hubert Haoyang Duan , Vladimir Pestov , Varun Singla

A Matching Technique in Example-Based Machine Translation

This paper addresses an important problem in Example-Based Machine Translation (EBMT), namely how to measure similarity between a sentence fragment and a set of stored examples. A new method is proposed that measures similarity according to…

cmp-lg · Computer Science 2008-02-03 Lambros Cranias , Harris Papageorgiou , Stelios Piperidis

Efficient Feature Matching by Progressive Candidate Search

We present a novel feature matching algorithm that systematically utilizes the geometric properties of features such as position, scale, and orientation, in addition to the conventional descriptor vectors. In challenging scenes with the…

Computer Vision and Pattern Recognition · Computer Science 2017-01-23 Sehyung Lee , Jongwoo Lim , Il Hong Suh

A path-finding algorithm for computing minimal-weight-matching centrosymmetry parameter

In 2020, Peter Larsen reported flaws in the methods for centrosymmetry parameter computation in the existing molecular dynamics and analysis packages. He proposed an intuitive an mathematically rigorous formulation for centrosymmetry…

Computational Physics · Physics 2026-05-22 Vasily V. Pisarev

$LCSk$++: Practical similarity metric for long strings

In this paper we present $LCSk$++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants…

Data Structures and Algorithms · Computer Science 2019-08-27 Filip Pavetić , Goran Žužić , Mile Šikić

Bridging Classical and Quantum String Matching: A Computational Reformulation of Bit-Parallelism

String matching is a fundamental problem in computer science, with critical applications in text retrieval, bioinformatics, and data analysis. Among the numerous solutions that have emerged for this problem in recent decades,…

Data Structures and Algorithms · Computer Science 2025-03-10 Simone Faro , Arianna Pavone , Caterina Viola

Contrastive prediction strategies for unsupervised segmentation and categorization of phonemes and words

We investigate the performance on phoneme categorization and phoneme and word segmentation of several self-supervised learning (SSL) methods based on Contrastive Predictive Coding (CPC). Our experiments show that with the existing…

Machine Learning · Computer Science 2024-09-13 Santiago Cuervo , Maciej Grabias , Jan Chorowski , Grzegorz Ciesielski , Adrian Łańcucki , Paweł Rychlikowski , Ricard Marxer

CCPM: A Chinese Classical Poetry Matching Dataset

Poetry is one of the most important art forms of human languages. Recently many studies have focused on incorporating some linguistic features of poetry, such as style and sentiment, into its understanding or generation system. However,…

Computation and Language · Computer Science 2021-06-04 Wenhao Li , Fanchao Qi , Maosong Sun , Xiaoyuan Yi , Jiarui Zhang

A Fast String Matching Algorithm Based on Lowlight Characters in the Pattern

We put forth a new string matching algorithm which matches the pattern from neither the left nor the right end, instead a special position. Comparing with the Knuth-Morris-Pratt algorithm and the Boyer-Moore algorithm, the new algorithm is…

Data Structures and Algorithms · Computer Science 2014-01-29 Zhengjun Cao , Lihua Liu

Computing alignment plots efficiently

Dot plots are a standard method for local comparison of biological sequences. In a dot plot, a substring to substring distance is computed for all pairs of fixed-size windows in the input strings. Commonly, the Hamming distance is used…

Data Structures and Algorithms · Computer Science 2009-09-11 Peter Krusche , Alexander Tiskin

Proposal and study of statistical features for string similarity computation and classification

Adaptations of features commonly applied in the field of visual computing, co-occurrence matrix (COM) and run-length matrix (RLM), are proposed for the similarity computation of strings in general (words, phrases, codes and texts). The…

Machine Learning · Computer Science 2026-05-15 E. O. Rodrigues , D. Casanova , M. Teixeira , V. Pegorini , F. Favarim , E. Clua , A. Conci , Panos Liatsis