Related papers: Direct covariance matrix estimation with compositi…
High-dimensional compositional data arise naturally in many applications such as metagenomic data analysis. The observed data lie in a high-dimensional simplex, and conventional statistical methods often fail to produce sensible results due…
Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, among others, consistently reveal strong…
Microbial communities analysis is drawing growing attention due to the rapid development of high-throughput sequencing techniques nowadays. The observed data has the following typical characteristics: it is high-dimensional, compositional…
Compositional data sets are ubiquitous in science, including geology, ecology, and microbiology. In microbiome research, compositional data primarily arise from high-throughput sequence-based profiling experiments. These data comprise…
One important problem in microbiome analysis is to identify the bacterial taxa that are associated with a response, where the microbiome data are summarized as the composition of the bacterial taxa at different taxonomic levels. This paper…
Relying on recent advances in statistical estimation of covariance distances based on random matrix theory, this article proposes an improved covariance and precision matrix estimation for a wide family of metrics. The method is shown to…
Repeated measurements are common in many fields, where random variables are observed repeatedly across different subjects. Such data have an underlying hierarchical structure, and it is of interest to learn covariance/correlation at…
We consider the problem of joint estimation of structured covariance matrices. Assuming the structure is unknown, estimation is achieved using heterogeneous training sets. Namely, given groups of measurements coming from centered…
The dependency structure of multivariate data can be analyzed using the covariance matrix $\Sigma$. In many fields the precision matrix $\Sigma^{-1}$ is even more informative. As the sample covariance estimator is singular in…
Network estimation and variable selection have been extensively studied in the statistical literature, but only recently have those two challenges been addressed simultaneously. In this paper, we seek to develop a novel method to…
We consider high-dimensional measurement errors with high-frequency data. Our objective is on recovering the high-dimensional cross-sectional covariance matrix of the random errors with optimality. In this problem, not all components of the…
We consider the problem of estimating a high-dimensional covariance matrix from a small number of observations when covariates on pairs of variables are available and the variables can have spatial structure. This is motivated by the…
Covariance matrix estimation is a fundamental statistical task in many applications, but the sample covariance matrix is sub-optimal when the sample size is comparable to or less than the number of features. Such high-dimensional settings…
Many scientific datasets are compositional in nature. Important biological examples include species abundances in ecology, cell-type compositions derived from single-cell sequencing data, and amplicon abundance data in microbiome research.…
We consider the problem of predicting several response variables using the same set of explanatory variables. This setting naturally induces a group structure over the coefficient matrix, in which every explanatory variable corresponds to a…
The major sources of abundant data are constantly expanding with the available data collection methodologies in various applications - medical, insurance, scientific, bio-informatics and business. These data sets may be distributed…
Motivated by regression analysis for microbiome compositional data, this paper considers generalized linear regression analysis with compositional covariates, where a group of linear constraints on regression coefficients are imposed to…
Biological sequencing data consist of read counts, e.g. of specified taxa and often exhibit sparsity (zero-count inflation) and overdispersion (extra-Poisson variability). As most sequencing techniques provide an arbitrary total count,…
By creating networks of biochemical pathways, communities of micro-organisms are able to modulate the properties of their environment and even the metabolic processes within their hosts. Next-generation high-throughput sequencing has led to…
Estimating covariance matrices with high-dimensional complex data presents significant challenges, particularly concerning positive definiteness, sparsity, and numerical stability. Existing robust sparse estimators often fail to guarantee…