Related papers: Alignment-free sequence comparison using absent wo…
Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free…
This paper presents a new approach to statistical similarity assessment based on sequence alignment. The algorithm performs mutual matching of two random sequences by successively searching for common elements and by applying sequence…
An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their…
Genome and metagenome comparisons based on large amounts of next-generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads.…
A minimal absent word of a sequence x, is a sequence yt hat is not a factorof x, but all of its proper factors are factors of x as well. The set of minimal absent words uniquely defines the sequence itself. In recent times minimal absent…
In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input…
Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the…
A {\em subsequence} of a word $w$ is a word $u$ that can be obtained by deleting some letters from $w$ while maintaining the relative order of the remaining letters, e.g., $\mathtt{lala}$ is a subsequence of $\mathtt{alfalfa}$. A word, over…
Given a string $w$, another string $v$ is said to be a subsequence of $w$ if $v$ can be obtained from $w$ by removing some of its letters; on the other hand, $v$ is called an absent subsequence of $w$ if $v$ is not a subsequence of $w$. The…
Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed…
This paper proposes an algorithm for alignment-free sequence comparison using Logical Match. Here, we compute the score using fuzzy membership values which generate automatically from the number of matches and mismatches. We demonstrate the…
An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string…
Background: Sequence comparison is essential in bioinformatics, serving various purposes such as taxonomy, functional inference, and drug discovery. The traditional method of aligning sequences for comparison is time-consuming, especially…
Given a set of sequences, the distance between pairs of them helps us to find their similarity and derive structural relationship amongst them. For genomic sequences such measures make it possible to construct the evolution tree of…
Sequence comparison is a widely used computational technique in modern molecular biology. In spite of the frequent use of sequence comparisons the important problem of assigning statistical significance to a given degree of similarity is…
Simon's congruence $\sim_k$ is defined as follows: two words are $\sim_k$-equivalent if they have the same set of subsequences of length at most $k$. We propose an algorithm which computes, given two words $s$ and $t$, the largest $k$ for…
This paper describes a new alignment algorithm for sequences that can be used for determination of deletions and substitutions. It provides several solutions out of which the best one can be chosen on the basis of minimization of gaps or…
Enormous volumes of short reads data from next-generation sequencing (NGS) technologies have posed new challenges to the area of genomic sequence comparison. The multiple sequence alignment approach is hardly applicable to NGS data due to…
Sequence Alignment is the process of aligning biological sequences in order to identify similarities between multiple sequences. In this paper, a Quantum Algorithm for finding the optimal alignment between DNA sequences has been…
We present an algorithm that takes an unannotated corpus as its input, and returns a ranked list of probable morphologically related pairs as its output. The algorithm tries to discover morphologically related pairs by looking for pairs…