Related papers: Biological sequence analysis
Stochastic Process Model has many applications in analysis of longitudinal biodemographic data. Such data contain various physiological variables (sometimes known as covariates). It also can potentially contain genetic information available…
In this paper we survey recent work on the use of statistical model checking techniques for biological applications. We begin with an overview of the basic modelling techniques for biochemical reactions and their corresponding stochastic…
Probabilistic graphical models (PGMs) have become a popular tool for computational analysis of biological data in a variety of domains. But, what exactly are they and how do they work? How can we use PGMs to discover patterns that are…
Identifying and characterizing mutational paths is an important issue in evolutionary biology and in bioengineering. We here introduce a generic description of mutational paths in terms of the goodness of sequences and of the mutational…
Modeling biological sequences such as DNA, RNA, and proteins is crucial for understanding complex processes like gene regulation and protein synthesis. However, most current models either focus on a single type or treat multiple types of…
Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, motif finding algorithms of increasingly high performance are required to process the big…
The discovery of motifs underlying gene expression is a challenging one. Some of these motifs are known transcription factors, but sequence inspection often provides valuable clues, even discovery of novel motifs with uncharacterized…
100 years after Smoluchowski introduces his approach to stochastic processes, they are now at the basis of mathematical and physical modeling in cellular biology: they are used for example to analyse and to extract features from large…
In this paper, we explore the class of the Hidden Semi-Markov Model (HSMM), a flexible extension of the popular Hidden Markov Model (HMM) that allows the underlying stochastic process to be a semi-Markov chain. HSMMs are typically used less…
Large Language models (LLMs) have emerged as powerful tools for addressing challenges across diverse domains. Notably, recent studies have demonstrated that large language models significantly enhance the efficiency of biomolecular analysis…
Feature extraction is an unavoidable task, especially in the critical step of preprocessing biological sequences. This step consists for example in transforming the biological sequences into vectors of motifs where each motif is a…
An important part of the analysis of bio-molecular networks is to detect different functional units. Different functions are reflected in a different evolutionary dynamics, and hence in different statistical characteristics of network…
The theoretical analysis of performance has been an important tool in the engineering of algorithms in many application domains. Its goals are to predict the empirical performance of an algorithm and to be a yardstick that drives the design…
Regression methods dominate the practice of biostatistical analysis, but biostatistical training emphasises the details of regression models and methods ahead of the purposes for which such modelling might be useful. More broadly,…
Stochastic reaction networks are mathematical models with a wide range of applications in biochemistry, ecology, and epidemiology, and are often complex to analyze. Except for some special cases, it is generally difficult to predict how the…
Scientists often use observational time series data to study complex natural processes, but regression analyses often assume simplistic dynamics. Recent advances in deep learning have yielded startling improvements to the performance of…
Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular…
The rapid development of high-throughput sequencing technologies has led to an explosive increase in biological sequence data, making sequence clustering a fundamental task in large-scale bioinformatics analyses. Unlike traditional…
Clusters of genes that have evolved by repeated segmental duplication present difficult challenges throughout genomic analysis, from sequence assembly to functional analysis. Improved understanding of these clusters is of utmost importance,…
Auto-regulatory feedback loops are one of the most common network motifs. A wide variety of stochastic models have been constructed to understand how the fluctuations in protein numbers in these loops are influenced by the kinetic…