Related papers: Multiple multi-sample testing under arbitrary cova…
Large-scale hypothesis testing has become a ubiquitous problem in high-dimensional statistical inference, with broad applications in various scienfitic disciplines. One relevant application is constituted by imaging mass spectrometry (IMS)…
High-dimensional tests are applied to find relevant sets of variables and relevant models. If variables are selected by analyzing the sums of products matrices and a corresponding mean-value test is performed, there is the danger that the…
In this paper, we consider the problem of simultaneous testing of multivariate normal means under arbitrary covariance dependence. Specifically, let $\boldsymbol{X}\sim N_n(\boldsymbol{\theta},\boldsymbol{\Sigma})$, where…
We propose a Bayesian latent variable model to estimate covariate-assisted dependence structures across multiple modalities of multivariate data that may be observed asynchronously. This setting commonly arises in longitudinal biomedical…
Independence screening methods such as the two sample $t$-test and the marginal correlation based ranking are among the most widely used techniques for variable selection in ultrahigh dimensional data sets. In this short note, simple…
This paper proposes a new mutual independence test for a large number of high dimensional random vectors. The test statistic is based on the characteristic function of the empirical spectral distribution of the sample covariance matrix. The…
Current statistical inference problems in areas like astronomy, genomics, and marketing routinely involve the simultaneous testing of thousands -- even millions -- of null hypotheses. For high-dimensional multivariate distributions, these…
Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to…
Testing independence among a number of (ultra) high-dimensional random samples is a fundamental and challenging problem. By arranging $n$ identically distributed $p$-dimensional random vectors into a $p \times n$ data matrix, we investigate…
We propose a general, modular method for significance testing of groups (or clusters) of variables in a high-dimensional linear model. In presence of high correlations among the covariables, due to serious problems of identifiability, it is…
The paper considers variable selection in linear regression models where the number of covariates is possibly much larger than the number of observations. High dimensionality of the data brings in many complications, such as (possibly…
Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures…
Large-scale multiple testing under static factor models is widely used to detect sparse signals in high-dimensional data. However, static factor models are arguably too stringent because they ignore serial correlation, which seriously…
Feature screening is an important tool in analyzing ultrahigh-dimensional data, particularly in the field of Omics and oncology studies. However, most attention has been focused on identifying features that have a linear or monotonic impact…
Multi-label classification is a common challenge in various machine learning applications, where a single data instance can be associated with multiple classes simultaneously. The current paper proposes a novel tree-based method for…
In many applied sciences a popular analysis strategy for high-dimensional data is to fit many multivariate generalized linear models in parallel. This paper presents a novel approach to address the resulting multiple testing problem by…
With medical tests becoming increasingly available, concerns about over-testing and over-treatment dramatically increase. Hence, it is important to understand the influence of testing on treatment selection in general practice. Most…
Datasets may contain observations with multiple labels. If the labels are not mutually exclusive, and if the labels vary greatly in frequency, obtaining a sample that includes sufficient observations with scarcer labels to make inferences…
Multivariate regression model is a natural generalization of the classical univari- ate regression model for fitting multiple responses. In this paper, we propose a high- dimensional multivariate conditional regression model for…
Modeling the dependence between outputs is a fundamental challenge in multilabel classification. In this work we show that a generic regularized nonlinearity mapping independent predictions to joint predictions is sufficient to achieve…