Related papers: A Probabilistic Model For Sequence Analysis

Pairwise alignment of the DNA sequence using hypercomplex number representation

A new set of DNA base-nucleic acid codes and their hypercomplex number representation have been introduced for taking the probability of each nucleotide into full account. A new scoring system has been proposed to suit the hypercomplex…

Other Quantitative Biology · Quantitative Biology 2014-03-12 Jian-Jun Shu , Li Shan Ouw

Statistical distributions of sequencing by synthesis with probabilistic nucleotide incorporation

Sequencing by synthesis is used in many next-generation DNA sequencing technologies. Some of the technologies, especially those exploring the principle of single-molecule sequencing, allow incomplete nucleotide incorporation in each cycle.…

Genomics · Quantitative Biology 2024-05-28 Yong Kong

Identification of repeats in DNA sequences using nucleotide distribution uniformity

Repetitive elements are important in genomic structures, functions and regulations, yet effective methods in precisely identifying repetitive elements in DNA sequences are not fully accessible, and the relationship between repetitive…

Genomics · Quantitative Biology 2016-08-03 Changchuan Yin

DNA coding and G\"odel numbering

Evolution consists of distinct stages: cosmological, biological, linguistic. Since biology verges on natural sciences and linguistics, we expect that it shares structures and features from both forms of knowledge. Indeed, in DNA we…

Other Quantitative Biology · Quantitative Biology 2019-10-01 Argyris Nicolaidis , Fotis Psomopoulos

Mutation model for nucleotide sequences based on crystal basis

A nucleotides sequence is identified, in the two (four) letters alphabet, by the the labels of a vector state of an irreducible representation of U_q(sl(2)) (U_q(sl(2) + sl(2))), in the limit q -> 0. A master equation for the distribution…

Biomolecules · Quantitative Biology 2007-05-23 C. Minichini , A. Sciarrino

Statistical analysis of Gene and Intergenic DNA Sequences

Much of the on-going statistical analysis of DNA sequences is focused on the estimation of characteristics of coding and non-coding regions that would possibly allow discrimination of these regions. In the current approach, we concentrate…

Genomics · Quantitative Biology 2009-11-10 D. Kugiumtzis , A. Provata

On distribution of runs and patterns in four state trials

From a mathematical and statistical point of view, a segment of a DNA strand can be viewed as a sequence of four-state (A, C, G, T) trials. We consider distributions of runs and patterns related to run lengths of multi-state sequences,…

Probability · Mathematics 2024-06-27 Jungtaek Oh

Statistical linguistic study of DNA sequences

A new family of compound Poisson distribution functions from statistical linguistic is used to study the n-tuples and nucleotide composition features of DNA sequences. The relative frequency distribution of the 6-tuples and 7- tuples…

Statistical Mechanics · Physics 2007-05-23 K. L. Ng , S. P. Li

Length distribution of sequencing by synthesis: fixed flow cycle model

Sequencing by synthesis is the underlying technology for many next-generation DNA sequencing platforms. We developed a new model, the fixed flow cycle model, to derive the distributions of sequence length for a given number of flow cycles…

Genomics · Quantitative Biology 2024-05-28 Yong Kong

A Better Good-Turing Estimator for Sequence Probabilities

We consider the problem of estimating the probability of an observed string drawn i.i.d. from an unknown distribution. The key feature of our study is that the length of the observed string is assumed to be of the same order as the size of…

Information Theory · Computer Science 2007-07-13 Aaron B. Wagner , Pramod Viswanath , Sanjeev R. Kulkarni

Long range correlations in DNA sequences

The so called long range correlation properties of DNA sequences are studied using the variance analyses of the density distribution of a single or a group of nucleotides in a model independent way. This new method which was suggested…

Biological Physics · Physics 2007-05-23 A. K. Mohanty , A. V. S. S. Narayana Rao

A unifying framework for the modelling and analysis of STR DNA samples arising in forensic casework

This paper presents a new framework for analysing forensic DNA samples using probabilistic genotyping. Specifically it presents a mathematical framework for specifying and combining the steps in producing forensic casework electropherograms…

Applications · Statistics 2018-02-28 Robert George Cowell

A probabilistic analysis of shotgun sequencing for metagenomics

Genome sequencing is the basis for many modern biological and medicinal studies. With recent technological advances, metagenomics has become a problem of interest. This problem entails the analysis and reconstruction of multiple DNA…

Probability · Mathematics 2022-01-14 Marlee Herring

Sequence Modeling via Segmentations

Segmental structure is a common pattern in many types of sequences such as phrases in human languages. In this paper, we present a probabilistic model for sequences via their segmentations. The probability of a segmented sequence is…

Machine Learning · Statistics 2018-07-20 Chong Wang , Yining Wang , Po-Sen Huang , Abdelrahman Mohamed , Dengyong Zhou , Li Deng

Probabilistic Models of k-mer Frequencies (Extended Abstract)

In this article, we review existing probabilistic models for modeling abundance of fixed-length strings (k-mers) in DNA sequencing data. These models capture dependence of the abundance on various phenomena, such as the size and repeat…

Quantitative Methods · Quantitative Biology 2022-01-03 Askar Gafurov , Tomáš Vinař , Broňa Brejová

Computational aspects of DNA mixture analysis

Statistical analysis of DNA mixtures is known to pose computational challenges due to the enormous state space of possible DNA profiles. We propose a Bayesian network representation for genotypes, allowing computations to be performed…

Methodology · Statistics 2014-02-21 Therese Graversen , Steffen Lauritzen

Stochastics of DNA Quantification

A common approach to quantifying DNA involves repeated cycles of DNA amplification. This approach, employed by the polymerase chain reaction (PCR), produces outputs that are corrupted by amplification noise, making it challenging to…

Quantitative Methods · Quantitative Biology 2023-01-06 Abdoelnaser M Degoot , Wilfred Ndifon

Improving sequence-based genotype calls with linkage disequilibrium and pedigree information

Whole and targeted sequencing of human genomes is a promising, increasingly feasible tool for discovering genetic contributions to risk of complex diseases. A key step is calling an individual's genotype from the multiple aligned short read…

Applications · Statistics 2012-06-29 Baiyu Zhou , Alice S. Whittemore

On Counting Subsequences and Higher-Order Fibonacci Numbers

In array-based DNA synthesis, multiple strands of DNA are synthesized in parallel to reduce the time cost from the sum of their lengths to the length their shortest common supersequences. To maximize the amount of information that can be…

Information Theory · Computer Science 2024-05-29 Hsin-Po Wang , Chi-Wei Chin

Optimizing the Decoding Probability and Coverage Ratio of Composite DNA

This paper studies two problems that are motivated by the novel recent approach of composite DNA that takes advantage of the DNA synthesis property which generates a huge number of copies for every synthesized strand. Under this paradigm,…

Information Theory · Computer Science 2025-05-15 Tomer Cohen , Eitan Yaakobi