English
Related papers

Related papers: Learning from String Sequences

200 papers

Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the…

Data Structures and Algorithms · Computer Science 2017-12-13 Muhammad Farhan , Juvaria Tariq , Arif Zaman , Mudassir Shabbir , Imdad Ullah Khan

Kolmogorov complexity has inspired several alignment-free distance measures, based on the comparison of lengths of compressions, which have been applied successfully in many areas. One of these measures, the so-called Universal Similarity…

Quantitative Methods · Quantitative Biology 2011-11-10 Jairo Rocha , Francesc Rosselló , Joan Segura

In many applications, it is necessary to determine the string similarity. Edit distance[WF74] approach is a classic method to determine Field Similarity. A well known dynamic programming algorithm [GUS97] is used to calculate edit distance…

Data Structures and Algorithms · Computer Science 2007-05-23 Qi Xiao Yang , Sung Sam Yuan , Lu Chun , Li Zhao , Sun Peng

This paper presents a new similarity measure to be used for general tasks including supervised learning, which is represented by the K-nearest neighbor classifier (KNN). The proposed similarity measure is invariant to large differences in…

Machine Learning · Computer Science 2014-09-04 Ahmad Basheer Hassanat

This study aims to publish a novel similarity metric to increase the speed of comparison operations. Also the new metric is suitable for distance-based operations among strings. Most of the simple calculation methods, such as string length…

Data Structures and Algorithms · Computer Science 2014-01-28 Sadi Evren Seker , Oguz Altun , Uğur Ayan , Cihan Mert

This paper presents a hybrid system for intuitive item similarity search that combines a Large Language Model (LLM) with a custom K-Nearest Neighbors (KNN) algorithm. Unlike black-box dense vector systems, this architecture provides…

Information Retrieval · Computer Science 2025-09-29 Ana Rodrigues , João Mata , Rui Rego

$K$-NN classifier is one of the most famous classification algorithms, whose performance is crucially dependent on the distance metric. When we consider the distance metric as a parameter of $K$-NN, learning an appropriate distance metric…

Machine Learning · Computer Science 2019-11-26 Kun Song

Deep semi-supervised learning has been widely implemented in the real-world due to the rapid development of deep learning. Recently, attention has shifted to the approaches such as Mean-Teacher to penalize the inconsistency between two…

Machine Learning · Statistics 2020-04-30 Sanyou Wu , Xingdong Feng , Fan Zhou

Measuring similarities between strings is central for many established and fast growing research areas including information retrieval, biology, and natural language processing. The traditional approach for string similarity measurements is…

Information Retrieval · Computer Science 2018-08-20 Mehdi Ben Lazreg , Morten Goodwin

One of the ubiquitous representation of long DNA sequence is dividing it into shorter k-mer components. Unfortunately, the straightforward vector encoding of k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse yet,…

Quantitative Methods · Quantitative Biology 2017-01-24 Patrick Ng

We introduce $k$NN-LMs, which extend a pre-trained neural language model (LM) by linearly interpolating it with a $k$-nearest neighbors ($k$NN) model. The nearest neighbors are computed according to distance in the pre-trained LM embedding…

Computation and Language · Computer Science 2020-02-18 Urvashi Khandelwal , Omer Levy , Dan Jurafsky , Luke Zettlemoyer , Mike Lewis

Machine learning qualifies computers to assimilate with data, without being solely programmed [1, 2]. Machine learning can be classified as supervised and unsupervised learning. In supervised learning, computers learn an objective that…

The problem of approximate string matching is important in many different areas such as computational biology, text processing and pattern recognition. A great effort has been made to design efficient algorithms addressing several variants…

Data Structures and Algorithms · Computer Science 2008-07-29 Dimitris Papamichail , Georgios Papamichail

We show that both an LSTM and a unitary-evolution recurrent neural network (URN) can achieve encouraging accuracy on two types of syntactic patterns: context-free long distance agreement, and mildly context-sensitive cross serial…

Computation and Language · Computer Science 2022-08-12 Jean-Philippe Bernardy , Shalom Lappin

Semantic similarity measures (SSMs) refer to a set of algorithms used to quantify the similarity of two or more terms belonging to the same ontology. Ontology terms may be associated to concepts, for instance in computational biology gene…

Molecular Networks · Quantitative Biology 2013-05-22 Pietro Hiram Guzzi , Simone Truglia , Pierangelo Veltri , Mario Cannataro

We propose the Deep Distance Measurement Method (DDMM) to improve retrieval accuracy in unsupervised multivariate time series similarity retrieval. DDMM enables learning of minute differences within states in the entire time series and…

Machine Learning · Computer Science 2026-03-16 Susumu Naito , Kouta Nakata , Yasunori Taguchi

More and more neural network approaches have achieved considerable improvement upon submodules of speaker diarization system, including speaker change detection and segment-wise speaker embedding extraction. Still, in the clustering stage,…

Audio and Speech Processing · Electrical Eng. & Systems 2019-12-02 Qingjian Lin , Ruiqing Yin , Ming Li , Hervé Bredin , Claude Barras

Adaptations of features commonly applied in the field of visual computing, co-occurrence matrix (COM) and run-length matrix (RLM), are proposed for the similarity computation of strings in general (words, phrases, codes and texts). The…

Machine Learning · Computer Science 2026-05-15 E. O. Rodrigues , D. Casanova , M. Teixeira , V. Pegorini , F. Favarim , E. Clua , A. Conci , Panos Liatsis

The $k$-nearest neighbour ($k$-NN) classifier is one of the oldest and most important supervised learning algorithms for classifying datasets. Traditionally the Euclidean norm is used as the distance for the $k$-NN classifier. In this…

Machine Learning · Statistics 2015-12-02 Stan Hatko

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of…

Computational Complexity · Computer Science 2011-11-09 Ming Li , Xin Chen , Xin Li , Bin Ma , Paul Vitanyi
‹ Prev 1 2 3 10 Next ›