Related papers: Bi-cross-validation for factor analysis

Double Cross Validation for the Number of Factors in Approximate Factor Models

Determining the number of factors is essential to factor analysis. In this paper, we propose {an efficient cross validation (CV)} method to determine the number of factors in approximate factor models. The method applies CV twice, first…

Methodology · Statistics 2019-07-04 Xianli Zeng , Yingcun Xia , Linjun Zhang

Fast Cross-Validation via Sequential Testing

With the increasing size of today's data sets, finding the right parameter configuration in model selection via cross-validation can be an extremely time-consuming task. In this paper we propose an improved cross-validation procedure which…

Machine Learning · Computer Science 2016-02-05 Tammo Krueger , Danny Panknin , Mikio Braun

Multi-study Factor Analysis

We introduce a novel class of factor analysis methodologies for the joint analysis of multiple studies. The goal is to separately identify and estimate 1) common factors shared across multiple studies, and 2) study-specific factors. We…

Applications · Statistics 2018-06-27 Roberta De Vito , Ruggero Bellio , Lorenzo Trippa , Giovanni Parmigiani

Nested cross-validation when selecting classifiers is overzealous for most practical applications

When selecting a classification algorithm to be applied to a particular problem, one has to simultaneously select the best algorithm for that dataset \emph{and} the best set of hyperparameters for the chosen model. The usual approach is to…

Machine Learning · Computer Science 2018-09-26 Jacques Wainer , Gavin Cawley

One-way or Two-way Factor Model for Matrix Sequences?

This paper investigates the issue of determining the dimensions of row and column factor spaces in matrix-valued data. Exploiting the eigen-gap in the spectrum of sample second moment matrices of the data, we propose a family of randomised…

Methodology · Statistics 2022-09-29 Yong He , Xin-bing Kong , Lorenzo Trapani , Long Yu

Factor analysis with finite data

Factor analysis aims to describe high dimensional random vectors by means of a small number of unknown common factors. In mathematical terms, it is required to decompose the covariance matrix $\Sigma$ of the random vector as the sum of a…

Optimization and Control · Mathematics 2017-08-02 Valentina Ciccone , Augusto Ferrante , Mattia Zorzi

Simulation-based validation of Bayes factor computation

We propose and evaluate two methods that validate the computation of Bayes factors: one based on an improved variant of simulation-based calibration checking (SBC) and one based on calibration metrics for binary predictions. We show that in…

Methodology · Statistics 2026-03-18 Martin Modrák , Sebastian Stroppel , Paul-Christian Bürkner

Cross-Validation with Confidence

Cross-validation is one of the most popular model selection methods in statistics and machine learning. Despite its wide applicability, traditional cross validation methods tend to select overfitting models, due to the ignorance of the…

Methodology · Statistics 2017-12-25 Jing Lei

Estimating the number of clusters using cross-validation

Many clustering methods, including k-means, require the user to specify the number of clusters as an input parameter. A variety of methods have been devised to choose the number of clusters automatically, but they often rely on strong…

Methodology · Statistics 2017-02-10 Wei Fu , Patrick O. Perry

Comparing Cluster-Based Cross-Validation Strategies for Machine Learning Model Evaluation

Cross-validation plays a fundamental role in Machine Learning, enabling robust evaluation of model performance and preventing overestimation on training and validation data. However, one of its drawbacks is the potential to create data…

Machine Learning · Computer Science 2025-08-28 Afonso Martini Spezia , Thomas Fontanari , Mariana Recamonde-Mendoza

Factor analysis in high dimensional biological data with dependent observations

Factor analysis is a critical component of high dimensional biological data analysis. However, modern biological data contain two key features that irrevocably corrupt existing methods. First, these data, which include longitudinal,…

Methodology · Statistics 2020-09-24 Chris McKennan

Phase transitions and sample complexity in Bayes-optimal matrix factorization

We analyse the matrix factorization problem. Given a noisy measurement of a product of two matrices, the problem is to estimate back the original matrices. It arises in many applications such as dictionary learning, blind matrix…

Numerical Analysis · Computer Science 2016-07-19 Yoshiyuki Kabashima , Florent Krzakala , Marc Mézard , Ayaka Sakata , Lenka Zdeborová

Factor Models with Real Data: a Robust Estimation of the Number of Factors

Factor models are a very efficient way to describe high dimensional vectors of data in terms of a small number of common relevant factors. This problem, which is of fundamental importance in many disciplines, is usually reformulated in…

Optimization and Control · Mathematics 2018-06-13 Valentina Ciccone , Augusto Ferrante , Mattia Zorzi

Efficient Estimation of Approximate Factor Models via Regularized Maximum Likelihood

We study the estimation of a high dimensional approximate factor model in the presence of both cross sectional dependence and heteroskedasticity. The classical method of principal components analysis (PCA) does not efficiently estimate the…

Methodology · Statistics 2012-10-01 Jushan Bai , Yuan Liao

Statistically Guided Divide-and-Conquer for Sparse Factorization of Large Matrix

The sparse factorization of a large matrix is fundamental in modern statistical learning. In particular, the sparse singular value decomposition and its variants have been utilized in multivariate regression, factor analysis, biclustering,…

Machine Learning · Statistics 2020-03-19 Kun Chen , Ruipeng Dong , Wanwan Xu , Zemin Zheng

Sparse group factor analysis for biclustering of multiple data sources

Motivation: Modelling methods that find structure in data are necessary with the current large volumes of genomic data, and there have been various efforts to find subsets of genes exhibiting consistent patterns over subsets of treatments.…

Machine Learning · Computer Science 2016-09-15 Kerstin Bunte , Eemeli Leppäaho , Inka Saarinen , Samuel Kaski

Cross-validation

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given…

Statistics Theory · Mathematics 2017-03-10 Sylvain Arlot

Permutation methods for factor analysis and PCA

Researchers often have datasets measuring features $x_{ij}$ of samples, such as test scores of students. In factor analysis and PCA, these features are thought to be influenced by unobserved factors, such as skills. Can we determine how…

Statistics Theory · Mathematics 2019-09-16 Edgar Dobriban

A Practitioner's Guide to Multiple Testing Error Rates

It is quite common in modern research, for a researcher to test many hypotheses. The statistical (frequentist) hypothesis testing framework, does not scale with the number of hypotheses in the sense that naively performing many hypothesis…

Methodology · Statistics 2013-06-26 Jonathan Rosenblatt

Cross-Validation, Risk Estimation, and Model Selection

Cross-validation is a popular non-parametric method for evaluating the accuracy of a predictive rule. The usefulness of cross-validation depends on the task we want to employ it for. In this note, I discuss a simple non-parametric setting,…

Methodology · Statistics 2019-09-27 Stefan Wager