Related papers: Alignment-free sequence comparison using absent wo…

Linear-Time Sequence Comparison Using Minimal Absent Words & Applications

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free…

Data Structures and Algorithms · Computer Science 2015-12-23 Maxime Crochemore , Gabriele Fici , Robert Mercaş , Solon P. Pissis

Sequence Alignment Algorithm for Statistical Similarity Assessment

This paper presents a new approach to statistical similarity assessment based on sequence alignment. The algorithm performs mutual matching of two random sequences by successively searching for common elements and by applying sequence…

Signal Processing · Electrical Eng. & Systems 2021-06-09 Jakub Nikonowicz , Łukasz Matuszewski , Paweł Kubczak

Linear-time Computation of Minimal Absent Words Using Suffix Array

An absent word of a word y of length n is a word that does not occur in y. It is a minimal absent word if all its proper factors occur in y. Minimal absent words have been computed in genomes of organisms from all domains of life; their…

Data Structures and Algorithms · Computer Science 2014-07-01 Carl Barton , Alice Heliou , Laurent Mouchard , Solon P. Pissis

Alignment-Free Sequence Analysis and Applications

Genome and metagenome comparisons based on large amounts of next-generation sequencing (NGS) data pose significant challenges for alignment-based approaches due to the huge data size and the relatively short length of the reads.…

Quantitative Methods · Quantitative Biology 2018-03-28 Jie Ren , Xin Bai , Yang Young Lu , Kujin Tang , Ying Wang , Gesine Reinert , Fengzhu Sun

A new distance based on minimal absent words and applications to biological sequences

A minimal absent word of a sequence x, is a sequence yt hat is not a factorof x, but all of its proper factors are factors of x as well. The set of minimal absent words uniquely defines the sequence itself. In recent times minimal absent…

Formal Languages and Automata Theory · Computer Science 2021-06-01 Giuseppa Castiglione , Jia Gao , Sabrina Mantaci , Antonio Restivo

Consensus Sequence Segmentation

In this paper we introduce a method to detect words or phrases in a given sequence of alphabets without knowing the lexicon. Our linear time unsupervised algorithm relies entirely on statistical relationships among alphabets in the input…

Computation and Language · Computer Science 2013-12-31 Tamal Chowdhury , Rabindra Rakshit , Arko Banerjee

Efficient Approximation Algorithms for String Kernel Based Sequence Classification

Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the…

Data Structures and Algorithms · Computer Science 2017-12-13 Muhammad Farhan , Juvaria Tariq , Arif Zaman , Mudassir Shabbir , Imdad Ullah Khan

Tight Bounds for the Number of Absent Subsequences

A {\em subsequence} of a word $w$ is a word $u$ that can be obtained by deleting some letters from $w$ while maintaining the relative order of the remaining letters, e.g., $\mathtt{lala}$ is a subsequence of $\mathtt{alfalfa}$. A word, over…

Formal Languages and Automata Theory · Computer Science 2025-09-01 Duncan Adamson , Pamela Fleischmann , Annika Huch , Florin Manea , Paul Sarnighausen-Cahn , Max Wiedenhöft

Efficiently Finding All Minimal and Shortest Absent Subsequences in a String

Given a string $w$, another string $v$ is said to be a subsequence of $w$ if $v$ can be obtained from $w$ by removing some of its letters; on the other hand, $v$ is called an absent subsequence of $w$ if $v$ is not a subsequence of $w$. The…

Data Structures and Algorithms · Computer Science 2025-05-01 Florin Manea , Tina Ringleb , Stefan Siemer , Maximilian Winkler

Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed…

Genomics · Quantitative Biology 2015-01-21 Ivan Borozan , Stuart Watt , Vincent Ferretti

An Algorithm for Alignment-free Sequence Comparison using Logical Match

This paper proposes an algorithm for alignment-free sequence comparison using Logical Match. Here, we compute the score using fuzzy membership values which generate automatically from the number of matches and mismatches. We demonstrate the…

Computational Engineering, Finance, and Science · Computer Science 2014-07-09 Sanil Shanker KP , Elizabeth Sherly , Jim Austin

Absent Subsequences in Words

An absent factor of a string $w$ is a string $u$ which does not occur as a contiguous substring (a.k.a. factor) inside $w$. We extend this well-studied notion and define absent subsequences: a string $u$ is an absent subsequence of a string…

Formal Languages and Automata Theory · Computer Science 2026-04-08 Maria Kosche , Tore Koß , Florin Manea , Stefan Siemer

An Assessment of PC-mer's Performance in Alignment-Free Phylogenetic Tree Construction

Background: Sequence comparison is essential in bioinformatics, serving various purposes such as taxonomy, functional inference, and drug discovery. The traditional method of aligning sequences for comparison is time-consuming, especially…

Quantitative Methods · Quantitative Biology 2023-11-23 Saeedeh Akbari Rokn Abadi , Melika Honarmand , Ali Hajialinaghi , Somayyeh Koohi

Distance Measures for Sequences

Given a set of sequences, the distance between pairs of them helps us to find their similarity and derive structural relationship amongst them. For genomic sequences such measures make it possible to construct the evolution tree of…

Information Theory · Computer Science 2012-08-29 Sandeep Hosangadi

Finite Width Model Sequence Comparison

Sequence comparison is a widely used computational technique in modern molecular biology. In spite of the frequent use of sequence comparisons the important problem of assigning statistical significance to a given degree of similarity is…

Quantitative Methods · Quantitative Biology 2007-05-23 Ralf Bundschuh , Nicholas Chia

Efficiently Testing Simon's Congruence

Simon's congruence $\sim_k$ is defined as follows: two words are $\sim_k$-equivalent if they have the same set of subsequences of length at most $k$. We propose an algorithm which computes, given two words $s$ and $t$, the largest $k$ for…

Formal Languages and Automata Theory · Computer Science 2021-03-16 Pawel Gawrychowski , Maria Kosche , Tore Koss , Florin Manea , Stefan Siemer

An Alignment Algorithm for Sequences

This paper describes a new alignment algorithm for sequences that can be used for determination of deletions and substitutions. It provides several solutions out of which the best one can be chosen on the basis of minimization of gaps or…

Information Theory · Computer Science 2012-11-01 Sandeep Hosangadi , Subhash Kak

Alignment-free comparison of next-generation sequencing data using compression-based distance measures

Enormous volumes of short reads data from next-generation sequencing (NGS) technologies have posed new challenges to the area of genomic sequence comparison. The multiple sequence alignment approach is hardly applicable to NGS data due to…

Genomics · Quantitative Biology 2020-03-25 Ngoc Hieu Tran , Xin Chen

A Linear Time Quantum Algorithm for Pairwise Sequence Alignment

Sequence Alignment is the process of aligning biological sequences in order to identify similarities between multiple sequences. In this paper, a Quantum Algorithm for finding the optimal alignment between DNA sequences has been…

Data Structures and Algorithms · Computer Science 2025-09-05 Md. Rabiul Islam Khan , Shadman Shahriar , Shaikh Farhan Rafid

Unsupervised discovery of morphologically related words based on orthographic and semantic similarity

We present an algorithm that takes an unannotated corpus as its input, and returns a ranked list of probable morphologically related pairs as its output. The algorithm tries to discover morphologically related pairs by looking for pairs…

Computation and Language · Computer Science 2007-05-23 Marco Baroni , Johannes Matiasek , Harald Trost