Related papers: Large-Scale Multiple Testing for Matrix-Valued Dat…
Large-scale multiple testing under static factor models is widely used to detect sparse signals in high-dimensional data. However, static factor models are arguably too stringent because they ignore serial correlation, which seriously…
Matrix-covariate is now frequently encountered in many biomedical researches. It is common to fit conventional statistical models by vectorizing matrix-covariate. This strategy, however, results in a large number of parameters, while the…
The matrix-variate normal distribution is a popular model for high-dimensional transposable data because it decomposes the dependence structure of the random matrix into the Kronecker product of two covariance matrices: one for each of the…
Recent advances in statistical theory, together with advances in the computational power of computers, provide alternative methods to do mass-univariate hypothesis testing in which a large number of univariate tests, can be properly used to…
Measuring the dependence of data plays a central role in statistics and machine learning. In this work, we summarize and generalize the main idea of existing information-theoretic dependence measures into a higher-level perspective by the…
Large-scale multiple testing is a fundamental problem in high dimensional statistical inference. It is increasingly common that various types of auxiliary information, reflecting the structural relationship among the hypotheses, are…
In large scale multiple testing problems, a two-class empirical Bayes approach can be used to control the false discovery rate (Fdr) for the entire array of hypotheses under study. A sample splitting step is incorporated to modify that…
Objective: The use of deep learning for electroencephalography (EEG) classification tasks has been rapidly growing in the last years, yet its application has been limited by the relatively small size of EEG datasets. Data augmentation,…
Statistical dependence between hypotheses poses a significant challenge to the stability of large scale multiple hypotheses testing. Ignoring it often results in an unacceptably large spread in the false positive proportion even though the…
Statistical analysis of multimodal imaging data is a challenging task, since the data involves high-dimensionality, strong spatial correlations and complex data structures. In this paper, we propose rigorous statistical testing procedures…
Deep learning is significantly advancing the analysis of electroencephalography (EEG) data by effectively discovering highly nonlinear patterns within the signals. Data partitioning and cross-validation are crucial for assessing model…
In many large scale multiple testing applications, the hypotheses often have a known graphical structure, such as gene ontology in gene expression data. Exploiting this graphical structure in multiple testing procedures can improve power as…
Electroencephalography (EEG) reflects the brain's functional state, making it a crucial tool for diverse detection applications like seizure detection and sleep stage classification. While deep learning-based approaches have recently shown…
False discovery rate (FDR) has been widely used as an error measure in large scale multiple testing problems, but most research in the area has been focused on procedures for controlling the FDR based on independent test statistics or the…
Identifying dependency in multivariate data is a common inference task that arises in numerous applications. However, existing nonparametric independence tests typically require computation that scales at least quadratically with the sample…
Machine learning (ML) and deep learning (DL) techniques have been widely applied to analyze electroencephalography (EEG) signals for disease diagnosis and brain-computer interfaces (BCI). The integration of multimodal data has been shown to…
This paper studies the high-dimensional mixed linear regression (MLR) where the output variable comes from one of the two linear regression models with an unknown mixing proportion and an unknown covariance structure of the random…
The most popular multiple testing procedures are stepwise procedures based on $P$-values for individual test statistics. Included among these are the false discovery rate (FDR) controlling procedures of Benjamini--Hochberg [J. Roy. Statist.…
Multiple hypothesis testing is a fundamental problem in high dimensional inference, with wide applications in many scientific fields. In genome-wide association studies, tens of thousands of tests are performed simultaneously to find if any…
Electroencephalography (EEG) is shown to be a valuable data source for evaluating subjects' mental states. However, the interpretation of multi-modal EEG signals is challenging, as they suffer from poor signal-to-noise-ratio, are highly…