Related papers: Bi-cross-validation for factor analysis
Determining the number of factors is essential to factor analysis. In this paper, we propose {an efficient cross validation (CV)} method to determine the number of factors in approximate factor models. The method applies CV twice, first…
With the increasing size of today's data sets, finding the right parameter configuration in model selection via cross-validation can be an extremely time-consuming task. In this paper we propose an improved cross-validation procedure which…
We introduce a novel class of factor analysis methodologies for the joint analysis of multiple studies. The goal is to separately identify and estimate 1) common factors shared across multiple studies, and 2) study-specific factors. We…
When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…
This paper investigates the issue of determining the dimensions of row and column factor spaces in matrix-valued data. Exploiting the eigen-gap in the spectrum of sample second moment matrices of the data, we propose a family of randomised…
Factor analysis aims to describe high dimensional random vectors by means of a small number of unknown common factors. In mathematical terms, it is required to decompose the covariance matrix $\Sigma$ of the random vector as the sum of a…
We propose and evaluate two methods that validate the computation of Bayes factors: one based on an improved variant of simulation-based calibration checking (SBC) and one based on calibration metrics for binary predictions. We show that in…
Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…
Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…
Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data…
Factor analysis is a critical component of high dimensional biological data analysis. However, modern biological data contain two key features that irrevocably corrupt existing methods. First, these data, which include longitudinal,…
We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix…
Factor models are a very efficient way to describe high dimensional vectors of data in terms of a small number of common relevant factors. This problem, which is of fundamental importance in many disciplines, is usually reformulated in…
We study the estimation of a high dimensional approximate factor model in the presence of both cross sectional dependence and heteroskedasticity. The classical method of principal components analysis (PCA) does not efficiently estimate the…
The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering,…
Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments.…
This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given…
Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how…
It is quite common in modern research, for a researcher to test many hypotheses. The statistical (frequentist) hypothesis testing framework, does not scale with the number of hypotheses in the sense that naively performing many hypothesis…
Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…