Related papers: Supervised Learning for Multi-Block Incomplete Dat…
Partial Least Squares (PLS) is a widely used method for data integration, designed to extract latent components shared across paired high-dimensional datasets. Despite decades of practical success, a precise theoretical understanding of its…
Sparse Partial Least Squares (sPLS) is a common dimensionality reduction technique for data fusion, which projects data samples from two views by seeking linear combinations with a small number of variables with the maximum variance.…
Partial least squares, as a dimension reduction method, has become increasingly important for its ability to deal with problems with a large number of variables. Since noisy variables may weaken the performance of the model, the sparse…
Partial Least Squares (PLS) methods have been heavily exploited to analyse the association between two blocs of data. These powerful approaches can be applied to data sets where the number of variables is greater than the number of…
Partial Least Squares (PLS) learns shared structure from paired data via the top singular vectors of the empirical cross-covariance (PLS-SVD), but multimodal datasets often have missing entries in both views. We study PLS-SVD under…
High dimensional data reduction techniques are provided by using partial least squares within deep learning. Our framework provides a nonlinear extension of PLS together with a disciplined approach to feature selection and architecture…
Motivation: The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection,…
The recent development of more sophisticated spectroscopic methods allows acqui- sition of high dimensional datasets from which valuable information may be extracted using multivariate statistical analyses, such as dimensionality reduction…
Relating a set of variables X to a response y is crucial in chemometrics. A quantitative prediction objective can be enriched by qualitative data interpretation, for instance by locating the most influential features. When high-dimensional…
Self-training is a simple yet effective method within semi-supervised learning. The idea is to iteratively enhance training data by adding pseudo-labeled data. Its generalization performance heavily depends on the selection of these…
A wide range of systems exhibit high dimensional incomplete data. Accurate estimation of the missing data is often desired, and is crucial for many downstream analyses. Many state-of-the-art recovery methods involve supervised learning…
Variable selection and dimension reduction are two commonly adopted approaches for high-dimensional data analysis, but have traditionally been treated separately. Here we propose an integrated approach, called sparse gradient learning…
Semi-Supervised Learning (SSL) has become a preferred paradigm in many deep learning tasks, which reduces the need for human labor. Previous studies primarily focus on effectively utilising the labelled and unlabeled data to improve…
We propose a theoretical framework to analyze semi-supervised classification under the low density separation assumption in a high-dimensional regime. In particular, we introduce QLDS, a linear classification model, where the low density…
High-dimensional data common in genomics, proteomics, and chemometrics often contains complicated correlation structures. Recently, partial least squares (PLS) and Sparse PLS methods have gained attention in these areas as dimension…
Semi-Supervised Learning (SSL) is a framework that utilizes both labeled and unlabeled data to enhance model performance. Conventional SSL methods operate under the assumption that labeled and unlabeled data share the same label space.…
The goal of semi-supervised learning is to improve supervised classifiers by using additional unlabeled training examples. In this work we study a simple self-learning approach to semi-supervised learning applied to the least squares…
Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely…
Partial least squares (PLS) regression combines dimensionality reduction and prediction using a latent variable model. Since partial least squares regression (PLS-R) does not require matrix inversion or diagonalization, it can be applied to…
This paper introduces a novel prior called Diversified Block Sparse Prior to characterize the widespread block sparsity phenomenon in real-world data. By allowing diversification on intra-block variance and inter-block correlation matrices,…