Related papers: QPS -- quadratic programming sampler, a motif find…
Motif finding is an important step for the detection of rare events occurring in a set of DNA or protein sequences. Extraction of information about these rare events can lead to new biological discoveries. Motifs are some important patterns…
The interaction between proteins and DNA is a key driving force in a significant number of biological processes such as transcriptional regulation, repair, recombination, splicing, and DNA modification. The identification of DNA-binding…
Discovery of transcription factor binding sites is a much explored and still exploring area of research in functional genomics. Many computational tools have been developed for finding motifs and each of them has their own advantages as…
We propose two quantum algorithms for a problem in bioinformatics, position weight matrix (PWM) matching, which aims to find segments (sequence motifs) in a biological sequence such as DNA and protein that have high scores defined by the…
Computational approaches to transcription factor binding site identification have been actively researched for the past decade. Negative examples have long been utilized in de novo motif discovery and have been shown useful in transcription…
A common problem in bioinformatics is related to identifying gene regulatory regions marked by relatively high frequencies of motifs, or deoxyribonucleic acid sequences that often code for transcription and enhancer proteins. Predicting…
We describe a class of growth algorithms for finding low energy states of heteropolymers. These polymers form toy models for proteins, and the hope is that similar methods will ultimately be useful for finding native states of real proteins…
One of the fundamental tasks in understanding genomics is the problem of predicting Transcription Factor Binding Sites (TFBSs). With more than hundreds of Transcription Factors (TFs) as labels, genomic-sequence based TFBS prediction is a…
Sequence discovery tools play a central role in several fields of computational biology. In the framework of Transcription Factor binding studies, motif finding algorithms of increasingly high performance are required to process the big…
Biological sequences may contain patterns that are signal important biomolecular functions; a classical example is regulation of gene expression by transcription factors that bind to specific patterns in genomic promoter regions. In motif…
We propose a novel sampling framework for inference in probabilistic models: an active learning approach that converges more quickly (in wall-clock time) than Markov chain Monte Carlo (MCMC) benchmarks. The central challenge in…
This thesis describes work on two applications of probabilistic programming: the learning of probabilistic program code given specifications, in particular program code of one-dimensional samplers; and the facilitation of sequential Monte…
We introduce a novel method to screen the promoters of a set of genes with shared biological function, against a precompiled library of motifs, and find those motifs which are statistically over-represented in the gene set. The gene sets…
Sampling the parameters of high-dimensional Continuous Time Markov Chains (CTMC) is a challenging problem with important applications in many fields of applied statistics. In this work a recently proposed type of non-reversible…
We present new algorithms for the problem of multiple string matching of gapped patterns, where a gapped pattern is a sequence of strings such that there is a gap of fixed length between each two consecutive strings. The problem has…
BACKGROUND: Transcriptional regulation is a key mechanism in the functioning of the cell, and is mostly effected through transcription factors binding to specific recognition motifs located upstream of the coding region of the regulated…
Understanding sub-cellular protein localisation is an essential component to analyse context specific protein function. Recent advances in quantitative mass-spectrometry (MS) have led to high resolution mapping of thousands of proteins to…
We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif,…
The problem of detecting a binding site -- a substring of DNA where transcription factors attach -- on a long DNA sequence requires the recognition of a small pattern in a large background. For short binding sites, the matching probability…
Sampling from complicated probability distributions is a hard computational problem arising in many fields, including statistical physics, optimization, and machine learning. Quantum computers have recently been used to sample from…