Related papers: Gene ranking and biomarker discovery under correla…
Designed gene expression micro-array experiments, consisting of several treatment levels with a number of replicates per level, are analyzed by applying simple tests for group differences at the per gene level. The gene level statistics are…
We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative…
Background: Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that need to be taken into account.…
Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features for association with disease. We propose a new approach,…
Detecting variation in the evolutionary process along chromosomes is increasingly important as whole-genome data becomes more widely available. For example, factors such as incomplete lineage sorting, horizontal gene transfer, and…
Background: The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis it is often necessary to discriminate between influential and…
A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene…
This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g., proteins and gene pathways) when only lower-level measurements are directly observed (e.g., peptides and individual genes).…
Background: Selecting feature genes to predict phenotypes is one of the typical tasks in analyzing genomics data. Though many general-purpose algorithms were developed for prediction, dealing with highly correlated genes in the prediction…
Cluster analysis of biological samples using gene expression measurements is a common task which aids the discovery of heterogeneous biological sub-populations having distinct mRNA profiles. Several model-based clustering algorithms have…
Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for…
Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a…
In microarray experiments, it is often of interest to identify genes which have a pre-specified gene expression profile with respect to time. Methods available in the literature are, however, typically not stringent enough in identifying…
Gene expression analysis aims at identifying the genes able to accurately predict biological parameters like, for example, disease subtyping or progression. While accurate prediction can be achieved by means of many different techniques,…
A common practice in microarray analysis is to transform the microarray raw data (light intensity) by a logarithmic transformation, and the justification for this transformation is to make the distribution more symmetric and Gaussian-like.…
Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes…
We propose a method for detecting differential gene expression that exploits the correlation between genes. Our proposal averages the univariate scores of each feature with the scores in correlation neighborhoods. In a number of real and…
Score-based causal discovery methods can effectively identify causal relationships by evaluating candidate graphs and selecting the one with the highest score. One popular class of scores is kernel-based generalized score functions, which…
Machine learning classification tasks often benefit from predicting a set of possible labels with confidence scores to capture uncertainty. However, existing methods struggle with the high-dimensional nature of the data and the lack of…
Gene-based testing is a commonly employed strategy in many genetic association studies. Gene-trait associations can be complex due to underlying population heterogeneity, gene-environment interactions, and various other reasons. Existing…