Related papers: Identifying statistical dependence in genomic sequ…

Multivariate dependence and genetic networks inference

A critical task in systems biology is the identification of genes that interact to control cellular processes by transcriptional activation of a set of target genes. Many methods have been developed to use statistical correlations in…

Quantitative Methods · Quantitative Biology 2010-11-24 Adam A. Margolin , Kai Wang , Andrea Califano , Ilya Nemenman

Sequence-structure relations of biopolymers

Motivation: DNA data is transcribed into single-stranded RNA, which folds into specific molecular structures. In this paper we pose the question to what extent sequence- and structure-information correlate. We view this correlation as…

Combinatorics · Mathematics 2016-08-23 Christopher Barrett , Fenix W. Huang , Christian M. Reidys

Analysing multiple types of molecular profiles simultaneously: connecting the needles in the haystack

It has been shown that a random-effects framework can be used to test the association between a gene's expression level and the number of DNA copies of a set of genes. This gene-set modelling framework was later applied to find associations…

Methodology · Statistics 2015-10-09 Renée Menezes , Leila Mohammadi , Jelle Goeman , Judith Boer

Small Coupling Expansion for Multiple Sequence Alignment

The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional/structural characterizations between homologous sequences in different…

Quantitative Methods · Quantitative Biology 2023-05-01 Louise Budzynski , Andrea Pagnani

Vine dependence graphs with latent variables as summaries for gene expression data

The advent of high-throughput sequencing technologies has lead to vast comparative genome sequences. The construction of gene-gene interaction networks or dependence graphs on the genome scale is vital for understanding the regulation of…

Methodology · Statistics 2023-03-06 Xinyao Fan , Harry Joe , Yongjin Park

Detecting Dependencies in Sparse, Multivariate Databases Using Probabilistic Programming and Non-parametric Bayes

Datasets with hundreds of variables and many missing values are commonplace. In this setting, it is both statistically and computationally challenging to detect true predictive relationships between variables and also to suppress false…

Machine Learning · Statistics 2018-04-03 Feras Saad , Vikash Mansinghka

Approaches to biological species delimitation based on genetic and spatial dissimilarity

The delimitation of biological species, i.e., deciding which individuals belong to the same species and whether and how many different species are represented in a data set, is key to the conservation of biodiversity. Much existing work…

Populations and Evolution · Quantitative Biology 2025-12-15 Gabriele d'Angella , Christian Hennig

Assessing coupling dynamics from an ensemble of time series

Finding interdependency relations between (possibly multivariate) time series provides valuable knowledge about the processes that generate the signals. Information theory sets a natural framework for non-parametric measures of several…

Information Theory · Computer Science 2016-02-09 German Gomez-Herrero , Wei Wu , Kalle Rutanen , Miguel C. Soriano , Gordon Pipa , Raul Vicente

Subsampling Methods for genomic inference

Large-scale statistical analysis of data sets associated with genome sequences plays an important role in modern biology. A key component of such statistical analyses is the computation of $p$-values and confidence bounds for statistics…

Applications · Statistics 2011-01-06 Peter J. Bickel , Nathan Boley , James B. Brown , Haiyan Huang , Nancy R. Zhang

A Pipeline for Integrated Theory and Data-Driven Modeling of Genomic and Clinical Data

High throughput genome sequencing technologies such as RNA-Seq and Microarray have the potential to transform clinical decision making and biomedical research by enabling high-throughput measurements of the genome at a granular level.…

Genomics · Quantitative Biology 2020-05-07 Vineet K Raghu , Xiaoyu Ge , Arun Balajee , Daniel J. Shirer , Isha Das , Panayiotis V. Benos , Panos K. Chrysanthis

Meta-Dependence in Conditional Independence Testing

Constraint-based causal discovery algorithms utilize many statistical tests for conditional independence to uncover networks of causal dependencies. These approaches to causal discovery rely on an assumed correspondence between the…

Machine Learning · Computer Science 2025-04-18 Bijan Mazaheri , Jiaqi Zhang , Caroline Uhler

Measuring Dependencies between Biological Signals with Self-supervision, and its Limitations

Measuring the statistical dependence between observed signals is a primary tool for scientific discovery. However, biological systems often exhibit complex non-linear interactions that currently cannot be captured without a priori knowledge…

Signal Processing · Electrical Eng. & Systems 2025-08-11 Evangelos Sariyanidi , John D. Herrington , Lisa Yankowitz , Pratik Chaudhari , Theodore D. Satterthwaite , Casey J. Zampella , Robert T. Schultz , Russell T. Shinohara , Birkan Tunc

Causal learning with sufficient statistics: an information bottleneck approach

The inference of causal relationships using observational data from partially observed multivariate systems with hidden variables is a fundamental question in many scientific domains. Methods extracting causal information from conditional…

Machine Learning · Statistics 2020-10-13 Daniel Chicharro , Michel Besserve , Stefano Panzeri

Modeling Genetic Networks from Clonal Analysis

In this report a systematic approach is used to determine the approximate genetic network and robust dependencies underlying differentiation. The data considered is in the form of a binary matrix and represent the expression of the nine…

Molecular Networks · Quantitative Biology 2007-05-23 Radhakrishnan Nagarajan , Jane E. Aubin , Charlotte A. Peterson

Statistical data mining for symbol associations in genomic databases

A methodology is proposed to automatically detect significant symbol associations in genomic databases. A new statistical test is proposed to assess the significance of a group of symbols when found in several genesets of a given database.…

Genomics · Quantitative Biology 2013-09-11 Bernard Ycart , Frédéric Pont , Jean-Jacques Fournié

Statistical analysis of Gene and Intergenic DNA Sequences

Much of the on-going statistical analysis of DNA sequences is focused on the estimation of characteristics of coding and non-coding regions that would possibly allow discrimination of these regions. In the current approach, we concentrate…

Genomics · Quantitative Biology 2009-11-10 D. Kugiumtzis , A. Provata

A study of dependency features of spike trains through copulas

Simultaneous recordings from many neurons hide important information and the connections characterizing the network remain generally undiscovered despite the progresses of statistical and machine learning techniques. Discerning the presence…

Applications · Statistics 2019-03-21 Pietro Verzelli , Laura Sacerdote

Information Theory of Genomes

Relation of genome sizes to organisms complexity is still described rather equivocally. Neither the number of genes (G-value), nor the total amount of DNA (C-value) correlates consistently with phenotype complexity. Using information theory…

Genomics · Quantitative Biology 2007-05-23 Dmitri V. Parkhomchuk

Measuring Statistical Dependencies via Maximum Norm and Characteristic Functions

In this paper, we focus on the problem of statistical dependence estimation using characteristic functions. We propose a statistical dependence measure, based on the maximum-norm of the difference between joint and product-marginal…

Machine Learning · Computer Science 2022-08-18 Povilas Daniušis , Shubham Juneja , Lukas Kuzma , Virginijus Marcinkevičius

Differential analysis in Transcriptomic: The strength of randomly picking 'reference' genes

Transcriptomic analysis are characterized by being not directly quantitative and only providing relative measurements of expression levels up to an unknown individual scaling factor. This difficulty is enhanced for differential expression…

Methodology · Statistics 2021-03-24 Dorota Desaulle , Céline Hoffmann , Bernard Hainque , Yves Rozenholc