Related papers: Of mice and men: Sparse statistical modeling in ca…
Various statistical methods important for genetic analysis are considered and developed. Namely, we concentrate on the multifactor dimensionality reduction, logic regression, random forests and stochastic gradient boosting. These methods…
Given genetic variations and various phenotypical traits, such as Magnetic Resonance Imaging (MRI) features, we consider two important and related tasks in biomedical research: i)to select genetic and phenotypical markers for disease…
Sparse latent multi-factor models have been used in many exploratory and predictive problems with high-dimensional multivariate observations. Because of concerns with identifiability, the latent factors are almost always assumed to be…
High-dimensional time series datasets are becoming increasingly common in many areas of biological and social sciences. Some important applications include gene regulatory network reconstruction using time course gene expression data, brain…
Testing for the significance of a subset of regression coefficients in a linear model, a staple of statistical analysis, goes back at least to the work of Fisher who introduced the analysis of variance (ANOVA). We study this problem under…
We consider statistical inference in high-dimensional regression problems under affine constraints on the parameter space. The theoretical study of this is motivated by the study of genetic determinants of diseases, such as diabetes, using…
Simultaneous analysis of gene expression data and genetic variants is highly of interest, especially when the number of gene expressions and genetic variants are both greater than the sample size. Association of both causal genes and…
Factorial designs are frequently used in different fields of science, e.g. psychological, medical or biometric studies. Standard approaches, as the ANOVA $F$-test, make different assumptions on the distribution of the error terms, the…
High resolution microarrays and second-generation sequencing platforms are powerful tools to investigate genome-wide alterations in DNA copy number, methylation and gene expression associated with a disease. An integrated genomic profiling…
The widespread availability of high-dimensional biological data has made the simultaneous screening of many biological characteristics a central problem in computational biology and allied sciences. While the dimensionality of such datasets…
Heterogeneity is a hallmark of many complex diseases. There are multiple ways of defining heterogeneity, among which the heterogeneity in genetic regulations, for example GEs (gene expressions) by CNVs (copy number variations) and…
Combined inference for heterogeneous high-dimensional data is critical in modern biology, where clinical and various kinds of molecular data may be available from a single study. Classical genetic association studies regress a single…
While multiple testing procedures have been the focus of much statistical research, an important facet of the problem is how to deal with possible confounding. Procedures have been developed by authors in genetics and statistics. In this…
We combine two important ideas in the analysis of large-scale genomics experiments (e.g. experiments that aim to identify genes that are differentially expressed between two conditions). The first is use of Empirical Bayes (EB) methods to…
Integrative analyses of different high dimensional data types are becoming increasingly popular. Similarly, incorporating prior functional relationships among variables in data analysis has been a topic of increasing interest as it helps…
Genome-wide association studies(GWAS) have proven to be highly useful in revealing the genetic basis of complex diseases. At present, most GWAS are studies of a particular single disease diagnosis against controls. However, in practice, an…
Assessing variability according to distinct factors in data is a fundamental technique of statistics. The method commonly regarded to as analysis of variance (ANOVA) is, however, typically confined to the case where all levels of a factor…
Standard approaches to analysing data in genome-wide association studies (GWAS) ignore any potential functional relationships between genetic markers. In contrast gene pathways analysis uses prior information on functional structure within…
Large-scale statistical analysis of data sets associated with genome sequences plays an important role in modern biology. A key component of such statistical analyses is the computation of $p$-values and confidence bounds for statistics…
In the last years, tens of thousands gene expression profiles for cells of several organisms have been monitored. Gene expression is a complex transcriptional process where mRNA molecules are translated into proteins, which control most of…