Related papers: Gene ranking and biomarker discovery under correla…

The Shrinkage Variance Hotelling $T^2$ Test for Genomic Profiling Studies

Designed gene expression micro-array experiments, consisting of several treatment levels with a number of replicates per level, are analyzed by applying simple tests for group differences at the per gene level. The gene level statistics are…

Methodology · Statistics 2017-12-11 Grant Izmirlian

Feature selection in omics prediction problems using cat scores and false nondiscovery rate control

We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative…

Applications · Statistics 2010-10-11 Miika Ahdesmäki , Korbinian Strimmer

A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies

Background: Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that need to be taken into account.…

Applications · Statistics 2012-11-02 Verena Zuber , A. Pedro Duarte Silva , Korbinian Strimmer

Gene set bagging for estimating replicability of gene set analyses

Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features for association with disease. We propose a new approach,…

Methodology · Statistics 2017-01-10 Andrew E. Jaffe , John D. Storey , Hongkai Ji , Jeffrey T. Leek

Split scores: a tool to quantify phylogenetic signal in genome-scale data

Detecting variation in the evolutionary process along chromosomes is increasingly important as whole-genome data becomes more widely available. For example, factors such as incomplete lineage sorting, horizontal gene transfer, and…

Populations and Evolution · Quantitative Biology 2017-01-03 Elizabeth S. Allman , Laura S. Kubatko , John A. Rhodes

Correlation-Adjusted Regression Survival Scores for High-Dimensional Variable Selection

Background: The development of classification methods for personalized medicine is highly dependent on the identification of predictive genetic markers. In survival analysis it is often necessary to discriminate between influential and…

Methodology · Statistics 2018-02-27 Thomas Welchowski , Verena Zuber , Matthias Schmid

Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis

A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories, such as gene…

Applications · Statistics 2009-09-29 Michael A. Newton , Fernando A. Quintana , Johan A. den Boon , Srikumar Sengupta , Paul Ahlquist

Direct estimation and inference of higher-level correlations from lower-level measurements with applications in gene-pathway and proteomics studies

This paper tackles the challenge of estimating correlations between higher-level biological variables (e.g., proteins and gene pathways) when only lower-level measurements are directly observed (e.g., peptides and individual genes).…

Methodology · Statistics 2024-07-11 Yue Wang , Haoran Shi

Handling highly correlated genes in prediction analysis of genomic studies

Background: Selecting feature genes to predict phenotypes is one of the typical tasks in analyzing genomics data. Though many general-purpose algorithms were developed for prediction, dealing with highly correlated genes in the prediction…

Applications · Statistics 2022-04-11 Li Xing , Songwan Joun , Kurt Mackay , Mary Lesperance , Xuekui Zhang

Robust model-based clustering with gene ranking

Cluster analysis of biological samples using gene expression measurements is a common task which aids the discovery of heterogeneous biological sub-populations having distinct mRNA profiles. Several model-based clustering algorithms have…

Methodology · Statistics 2012-01-30 Alberto Cozzini , Ajay Jasra , Giovanni Montana

A Comparative Analysis of Gene Expression Profiling by Statistical and Machine Learning Approaches

Many machine learning models have been proposed to classify phenotypes from gene expression data. In addition to their good performance, these models can potentially provide some understanding of phenotypes by extracting explanations for…

Genomics · Quantitative Biology 2024-02-05 Myriam Bontonou , Anaïs Haget , Maria Boulougouri , Benjamin Audit , Pierre Borgnat , Jean-Michel Arbona

Cancer Gene Profiling through Unsupervised Discovery

Precision medicine is a paradigm shift in healthcare relying heavily on genomics data. However, the complexity of biological interactions, the large number of genes as well as the lack of comparisons on the analysis of data, remain a…

Genomics · Quantitative Biology 2021-02-16 Enzo Battistella , Maria Vakalopoulou , Roger Sun , Théo Estienne , Marvin Lerousseau , Sergey Nikolaev , Emilie Alvarez Andres , Alexandre Carré , Stéphane Niyoteka , Charlotte Robert , Nikos Paragios , Eric Deutsch

Gene profiling for determining pluripotent genes in a time course microarray experiment

In microarray experiments, it is often of interest to identify genes which have a pre-specified gene expression profile with respect to time. Methods available in the literature are, however, typically not stringent enough in identifying…

Applications · Statistics 2009-01-18 J. Tuke , G. F. V. Glonek , P. J. Solomon

A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data

Gene expression analysis aims at identifying the genes able to accurately predict biological parameters like, for example, disease subtyping or progression. While accurate prediction can be achieved by means of many different techniques,…

Methodology · Statistics 2008-09-11 Christine De Mol , Sofia Mosci , Magali Traskine , Alessandro Verri

Does Logarithm Transformation of Microarray Data Affect Ranking Order of Differentially Expressed Genes?

A common practice in microarray analysis is to transform the microarray raw data (light intensity) by a logarithmic transformation, and the justification for this transformation is to make the distribution more symmetric and Gaussian-like.…

Quantitative Methods · Quantitative Biology 2016-11-17 Wentian Li , Young Ju Suh , Jingshan Zhang

Rank discriminants for predicting phenotypes from RNA expression

Statistical methods for analyzing large-scale biomolecular data are commonplace in computational biology. A notable example is phenotype prediction from gene expression data, for instance, detecting human cancers, differentiating subtypes…

Genomics · Quantitative Biology 2014-11-24 Bahman Afsari , Ulisses M. Braga-Neto , Donald Geman

Correlation-sharing for detection of differential gene expression

We propose a method for detecting differential gene expression that exploits the correlation between genes. Our proposal averages the univariate scores of each feature with the scores in correlation neighborhoods. In a number of real and…

Statistics Theory · Mathematics 2007-06-13 Robert Tibshirani , Larry Wasserman

Fast Causal Discovery by Approximate Kernel-based Generalized Score Functions with Linear Computational Complexity

Score-based causal discovery methods can effectively identify causal relationships by evaluating candidate graphs and selecting the one with the highest score. One popular class of scores is kernel-based generalized score functions, which…

Machine Learning · Computer Science 2025-06-10 Yixin Ren , Haocheng Zhang , Yewei Xia , Hao Zhang , Jihong Guan , Shuigeng Zhou

Trustworthy Classification through Rank-Based Conformal Prediction Sets

Machine learning classification tasks often benefit from predicting a set of possible labels with confidence scores to capture uncertainty. However, existing methods struggle with the high-dimensional nature of the data and the lack of…

Machine Learning · Computer Science 2024-07-08 Rui Luo , Zhixin Zhou

Integrated Quantile RAnk Test (iQRAT) for gene-level associations

Gene-based testing is a commonly employed strategy in many genetic association studies. Gene-trait associations can be complex due to underlying population heterogeneity, gene-environment interactions, and various other reasons. Existing…

Methodology · Statistics 2020-12-15 Tianying Wang , Iuliana Ionita-Laza , Ying Wei