English
Related papers

Related papers: A new distance between DNA sequences

200 papers

When the process underlying DNA substitutions varies across evolutionary history, the standard Markov models underlying standard phylogenetic methods are mathematically inconsistent. The most prominent example is the general time reversible…

Populations and Evolution · Quantitative Biology 2014-12-05 Michael D. Woodhams , Jesús Fernández-Sánchez , Jeremy G. Sumner

Modelling the substitution of nucleotides along a phylogenetic tree is usually done by a hidden Markov process. This allows to define a distribution of characters at the leaves of the trees and one might be able to obtain polynomial…

Populations and Evolution · Quantitative Biology 2020-10-12 Marta Casanellas , Jesús Fernández-Sánchez , Marina Garrote-López

A new class of distances appropriate for measuring similarity relations between sequences, say one type of similarity per distance, is studied. We propose a new ``normalized information distance'', based on the noncomputable notion of…

Computational Complexity · Computer Science 2011-11-09 Ming Li , Xin Chen , Xin Li , Bin Ma , Paul Vitanyi

The new wave of successful generative models in machine learning has increased the interest in deep learning driven de novo drug design. However, assessing the performance of such generative models is notoriously difficult. Metrics that are…

Machine Learning · Computer Science 2018-08-02 Kristina Preuer , Philipp Renz , Thomas Unterthiner , Sepp Hochreiter , Günter Klambauer

Under a markovian evolutionary process, the expected number of substitutions per site (also called branch length) that have occurred when a sequence has evolved from another according to a transition matrix $P$ can be approximated by…

Populations and Evolution · Quantitative Biology 2011-12-16 Marta Casanellas , Anna Kedzierska

Surrogate models are a well established approach to reduce the number of expensive function evaluations in continuous optimization. In the context of genetic programming, surrogate modeling still poses a challenge, due to the complex…

Neural and Evolutionary Computing · Computer Science 2018-07-04 Martin Zaefferer , Jörg Stork , Oliver Flasch , Thomas Bartz-Beielstein

Distances between sequences based on their $k$-mer frequency counts can be used to reconstruct phylogenies without first computing a sequence alignment. Past work has shown that effective use of k-mer methods depends on 1) model-based…

Populations and Evolution · Quantitative Biology 2017-05-22 Chris Durden , Seth Sullivant

We consider models of nucleotidic substitution processes where the rate of substitution at a given site depends on the state of its neighbours. For a wide class of such nonreversible models, we show how to compute consistent, mathematically…

Probability · Mathematics 2010-03-25 Mikael Falconnet

Phylogenetic tree reconstruction is traditionally based on multiple sequence alignments (MSAs) and heavily depends on the validity of this information bottleneck. With increasing sequence divergence, the quality of MSAs decays quickly.…

Populations and Evolution · Quantitative Biology 2011-01-11 Roland F. Schwarz , William Fletcher , Frank Förster , Benjamin Merget , Matthias Wolf , Jörg Schultz , Florian Markowetz

A mathematical algorithm to describe DNA or RNA sequences of $N$ nucleotides by a string of $2N$ integers numbers is presented in the framework of the so called crystal basis model of the genetic code. The description allows to define a not…

Other Quantitative Biology · Quantitative Biology 2017-03-06 A. Sciarrino

Phylogenetic networks can represent evolutionary events that cannot be described by phylogenetic trees. These networks are able to incorporate reticulate evolutionary events such as hybridization, introgression, and lateral gene transfer.…

Populations and Evolution · Quantitative Biology 2021-07-08 Elizabeth Gross , Leo van Iersel , Remie Janssen , Mark Jones , Colby Long , Yukihiro Murakami

Models like support vector machines or Gaussian process regression often require positive semi-definite kernels. These kernels may be based on distance functions. While definiteness is proven for common distances and kernels, a proof for a…

Machine Learning · Computer Science 2018-07-11 Martin Zaefferer , Thomas Bartz-Beielstein , Günter Rudolph

The edit distance under the DCJ model can be computed in linear time for genomes with equal content or with Indels. But it becomes NP-Hard in the presence of duplications, a problem largely unsolved especially when Indels are considered. In…

Data Structures and Algorithms · Computer Science 2017-05-29 Zhaoming Yin , Jijun Tang , Stephen W. Schaeffer , David A. Bader

The log-det distance between two aligned DNA sequences was introduced as a tool for statistically consistent inference of a gene tree under simple non-mixture models of sequence evolution. Here we prove that the log-det distance, coupled…

Populations and Evolution · Quantitative Biology 2018-06-14 Elizabeth S. Allman , Colby Long , John A. Rhodes

Given a set of sequences, the distance between pairs of them helps us to find their similarity and derive structural relationship amongst them. For genomic sequences such measures make it possible to construct the evolution tree of…

Information Theory · Computer Science 2012-08-29 Sandeep Hosangadi

This paper introduces the Gene Mover's Distance, a measure of similarity between a pair of cells based on their gene expression profiles obtained via single-cell RNA sequencing. The underlying idea of the proposed distance is to interpret…

Genomics · Quantitative Biology 2021-03-16 Riccardo Bellazzi , Andrea Codegoni , Stefano Gualandi , Giovanna Nicora , Eleonora Vercesi

Generative models are invaluable in many fields of science because of their ability to capture high-dimensional and complicated distributions, such as photo-realistic images, protein structures, and connectomes. How do we evaluate the…

Distances between probability distributions are a key component of many statistical machine learning tasks, from two-sample testing to generative modeling, among others. We introduce a novel distance between measures that compares them…

Machine Learning · Statistics 2025-07-09 Arturo Castellanos , Anna Korba , Pavlo Mozharovskyi , Hicham Janati

Tools that effectively analyze and compare sequences are of great importance in various areas of applied computational research, especially in the framework of molecular biology. In the present paper, we introduce simple geometric criteria…

Quantitative Methods · Quantitative Biology 2013-08-14 Boris Brimkov , Valentin E. Brimkov

Generative models have achieved remarkable success across a range of applications, yet their evaluation still lacks principled uncertainty quantification. In this paper, we develop a method for comparing how close different generative…

Machine Learning · Statistics 2025-10-24 Zijun Gao , Yan Sun , Han Su
‹ Prev 1 2 3 10 Next ›