Related papers: A new distance between DNA sequences

A new hierarchy of phylogenetic models consistent with heterogeneous substitution rates

When the process underlying DNA substitutions varies across evolutionary history, the standard Markov models underlying standard phylogenetic methods are mathematically inconsistent. The most prominent example is the general time reversible…

Populations and Evolution · Quantitative Biology 2014-12-05 Michael D. Woodhams , Jesús Fernández-Sánchez , Jeremy G. Sumner

Distance to the stochastic part of phylogenetic varieties

Modelling the substitution of nucleotides along a phylogenetic tree is usually done by a hidden Markov process. This allows to define a distribution of characters at the leaves of the trees and one might be able to obtain polynomial…

Populations and Evolution · Quantitative Biology 2020-10-12 Marta Casanellas , Jesús Fernández-Sánchez , Marina Garrote-López

The similarity metric

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of…

Computational Complexity · Computer Science 2011-11-09 Ming Li , Xin Chen , Xin Li , Bin Ma , Paul Vitanyi

Fr\'echet ChemNet Distance: A metric for generative models for molecules in drug discovery

The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, assessing the performance of such generative models is notoriously difficult. Metrics that are…

Machine Learning · Computer Science 2018-08-02 Kristina Preuer , Philipp Renz , Thomas Unterthiner , Sepp Hochreiter , Günter Klambauer

Generating Markov evolutionary matrices for a given branch length

Under a markovian evolutionary process, the expected number of substitutions per site (also called branch length) that have occurred when a sequence has evolved from another according to a transition matrix $P$ can be approximated by…

Populations and Evolution · Quantitative Biology 2011-12-16 Marta Casanellas , Anna Kedzierska

Linear Combination of Distance Measures for Surrogate Models in Genetic Programming

Surrogate models are a well established approach to reduce the number of expensive function evaluations in continuous optimization. In the context of genetic programming, surrogate modeling still poses a challenge, due to the complex…

Neural and Evolutionary Computing · Computer Science 2018-07-04 Martin Zaefferer , Jörg Stork , Oliver Flasch , Thomas Bartz-Beielstein

Identifiability of phylogenetic parameters from k-mer data under the coalescent

Distances between sequences based on their $k$-mer frequency counts can be used to reconstruct phylogenies without first computing a sequence alignment. Past work has shown that effective use of k-mer methods depends on 1) model-based…

Populations and Evolution · Quantitative Biology 2017-05-22 Chris Durden , Seth Sullivant

Phylogenetic distances for neighbour dependent substitution processes

We consider models of nucleotidic substitution processes where the rate of substitution at a given site depends on the state of its neighbours. For a wide class of such nonreversible models, we show how to compute consistent, mathematically…

Probability · Mathematics 2010-03-25 Mikael Falconnet

Evolutionary distances in the twilight zone -- a rational kernel approach

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly.…

Populations and Evolution · Quantitative Biology 2011-01-11 Roland F. Schwarz , William Fletcher , Frank Förster , Benjamin Merget , Matthias Wolf , Jörg Schultz , Florian Markowetz

Metrics and Harmonic Analysis on DNA

A mathematical algorithm to describe DNA or RNA sequences of $N$ nucleotides by a string of $2N$ integers numbers is presented in the framework of the so called crystal basis model of the genetic code. The description allows to define a not…

Other Quantitative Biology · Quantitative Biology 2017-03-06 A. Sciarrino

Distinguishing level-1 phylogenetic networks on the basis of data generated by Markov processes

Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer.…

Populations and Evolution · Quantitative Biology 2021-07-08 Elizabeth Gross , Leo van Iersel , Remie Janssen , Mark Jones , Colby Long , Yukihiro Murakami

An Empirical Approach For Probing the Definiteness of Kernels

Models like support vector machines or Gaussian process regression often require positive semi-definite kernels. These kernels may be based on distance functions. While definiteness is proven for common distances and kernels, a proof for a…

Machine Learning · Computer Science 2018-07-11 Martin Zaefferer , Thomas Bartz-Beielstein , Günter Rudolph

Exemplar or Matching: Modeling DCJ Problems with Unequal Content Genome Data

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In…

Data Structures and Algorithms · Computer Science 2017-05-29 Zhaoming Yin , Jijun Tang , Stephen W. Schaeffer , David A. Bader

Species tree inference from genomic sequences using the log-det distance

The log-det distance between two aligned DNA sequences was introduced as a tool for statistically consistent inference of a gene tree under simple non-mixture models of sequence evolution. Here we prove that the log-det distance, coupled…

Populations and Evolution · Quantitative Biology 2018-06-14 Elizabeth S. Allman , Colby Long , John A. Rhodes

Distance Measures for Sequences

Given a set of sequences, the distance between pairs of them helps us to find their similarity and derive structural relationship amongst them. For genomic sequences such measures make it possible to construct the evolution tree of…

Information Theory · Computer Science 2012-08-29 Sandeep Hosangadi

The Gene Mover's Distance: Single-cell similarity via Optimal Transport

This paper introduces the Gene Mover's Distance, a measure of similarity between a pair of cells based on their gene expression profiles obtained via single-cell RNA sequencing. The underlying idea of the proposed distance is to interpret…

Genomics · Quantitative Biology 2021-03-16 Riccardo Bellazzi , Andrea Codegoni , Stefano Gualandi , Giovanna Nicora , Eleonora Vercesi

A Practical Guide to Sample-based Statistical Distances for Evaluating Generative Models in Science

Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the…

Machine Learning · Computer Science 2024-10-11 Sebastian Bischoff , Alana Darcher , Michael Deistler , Richard Gao , Franziska Gerken , Manuel Gloeckler , Lisa Haxel , Jaivardhan Kapoor , Janne K Lappalainen , Jakob H Macke , Guy Moss , Matthijs Pals , Felix Pei , Rachel Rapp , A Erdem Sağtekin , Cornelius Schröder , Auguste Schulz , Zinovia Stefanidi , Shoji Toyota , Linda Ulmer , Julius Vetter

Kernel Trace Distance: Quantum Statistical Metric between Measures through RKHS Density Operators

Distances between probability distributions are a key component of many statistical machine learning tasks, from two-sample testing to generative modeling, among others. We introduce a novel distance between measures that compares them…

Machine Learning · Statistics 2025-07-09 Arturo Castellanos , Anna Korba , Pavlo Mozharovskyi , Hicham Janati

Geometric approach to string analysis: deviation from linearity and its use for biosequence classification

Tools that effectively analyze and compare sequences are of great importance in various areas of applied computational research, especially in the framework of molecular biology. In the present paper, we introduce simple geometric criteria…

Quantitative Methods · Quantitative Biology 2013-08-14 Boris Brimkov , Valentin E. Brimkov

Statistical Inference for Generative Model Comparison

Generative models have achieved remarkable success across a range of applications, yet their evaluation still lacks principled uncertainty quantification. In this paper, we develop a method for comparing how close different generative…

Machine Learning · Statistics 2025-10-24 Zijun Gao , Yan Sun , Han Su