Related papers: Geometric Aspects of Biological Sequence Compariso…
Tools that effectively analyze and compare sequences are of great importance in various areas of applied computational research, especially in the framework of molecular biology. In the present paper, we introduce simple geometric criteria…
A quasi-metric is a distance function which satisfies the triangle inequality but is not symmetric: it can be thought of as an asymmetric metric. The central result of this thesis, developed in Chapter 3, is that a natural correspondence…
A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of…
Given a set of sequences, the distance between pairs of them helps us to find their similarity and derive structural relationship amongst them. For genomic sequences such measures make it possible to construct the evolution tree of…
Several measures exist for string similarity, including notable ones like the edit distance and the indel distance. The former measures the count of insertions, deletions, and substitutions required to transform one string into another,…
We propose a family of near-metrics based on local graph diffusion to capture similarity for a wide class of data sets. These quasi-metametrics, as their names suggest, dispense with one or two standard axioms of metric spaces, specifically…
This paper proposes a general framework for matching similar subsequences in both time series and string databases. The matching results are pairs of query subsequences and database subsequences. The framework finds all possible pairs of…
Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly.…
Distance measuring is a very important task in digital geometry and digital image processing. Due to our natural approach to geometry we think of the set of points that are equally far from a given point as a Euclidean circle. Using the…
The distance on a set is a comparative function. The smaller the distance between two elements of that set, the closer, or more similar, those elements are. Fr\'echet axiomatized the distance into what is today known as a metric. In this…
String matching algorithm plays the vital role in the Computational Biology. The functional and structural relationship of the biological sequence is determined by similarities on that sequence. For that, the researcher is supposed to aware…
Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the…
Many learning algorithms such as kernel machines, nearest neighbors, clustering, or anomaly detection, are based on the concept of 'distance' or 'similarity'. Before similarities are used for training an actual machine learning model, we…
The author has recently introduced abstract algebraic frameworks of analogical proportions and similarity within the general setting of universal algebra. The purpose of this paper is to build a bridge from similarity to analogical…
This paper proposes a new method for determining similarity and anomalies between time series, most practically effective in large collections of (likely related) time series, by measuring distances between structural breaks within such a…
We propose a novel semiparametric classifier based on Mahalanobis distances of an observation from the competing classes. Our tool is a generalized additive model with the logistic link function that uses these distances as features to…
Sequence comparison is a basic task to capture similarities and differences between two or more sequences of symbols, with countless applications such as in computational biology. An alignment is a way to compare sequences, where a giving…
In this article, we propose tree edit distance with variables, which is an extension of the tree edit distance to handle trees with variables and has a potential application to measuring the similarity between mathematical formulas,…
We define a distance analogous to the Gromov-Hausdorff distance that enables the comparison of arbitrary quasi-isometric spaces. We also investigate properties preserved under limits with respect to this distance, as well as properties of…
Modelling the substitution of nucleotides along a phylogenetic tree is usually done by a hidden Markov process. This allows to define a distribution of characters at the leaves of the trees and one might be able to obtain polynomial…