Related papers: Multiple imputation for continuous variables using…
Principal component analysis (PCA) is often used to analyze multivariate data together with cluster analysis, which depends on the number of principal components used. It is therefore important to determine the number of significant…
Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation…
Multiple imputation provides an effective way to handle missing data. When several possible models are under consideration for the data, the multiple imputation is typically performed under a single-best model selected from the candidate…
Auxiliary information is frequently utilized in survey sampling to improve the efficiency of estimators of the finite population mean. However, the simultaneous use of multiple auxiliary variables often induces multicollinearity, which…
Principal component analysis (PCA) is perhaps the most widely used method for data dimensionality reduction. A key question in PCA is deciding how many factors to retain. This manuscript describes a new approach to automatically selecting…
Multivariate imputation by chained equations (MICE) is one of the most popular approaches to address missing values in a data set. This approach requires specifying a univariate imputation model for every variable under imputation. The…
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal components method dedicated to categorical data: multiple correspondence analysis (MCA). The…
Multiple imputation has become one of the standard methods in drawing inferences in many incomplete data applications. Applications of multiple imputation in relatively more complex settings, such as high-dimensional clustered data, require…
Principal component analysis (PCA) is arguably the most popular tool in multivariate exploratory data analysis. In this paper, we consider the question of how to handle heterogeneous variables that include continuous, binary, and ordinal.…
Principal component analysis (PCA) is often used for analyzing data in the most diverse areas. In this work, we report an integrated approach to several theoretical and practical aspects of PCA. We start by providing, in an intuitive and…
Multiple Imputation (MI) is one of the most popular approaches to addressing missing values in questionnaires and surveys. MI with multivariate imputation by chained equations (MICE) allows flexible imputation of many types of data. In…
We discuss the problem of estimating the number of principal components in Principal Com- ponents Analysis (PCA). Despite of the importance of the problem and the multitude of solutions proposed in the literature, it comes as a surprise…
Principal component analysis (PCA) is a widely used method for data processing, such as for dimension reduction and visualization. Standard PCA is known to be sensitive to outliers, and thus, various robust PCA methods have been proposed.…
Principal component analysis (PCA) is a widespread technique for data analysis that relies on the covariance-correlation matrix of the analyzed data. However to properly work with high-dimensional data, PCA poses severe mathematical…
Data integration, or the strategic analysis of multiple sources of data simultaneously, can often lead to discoveries that may be hidden in individualistic analyses of a single data source. We develop a new unsupervised data integration…
Methods for supervised principal component analysis (SPCA) aim to incorporate label information into principal component analysis (PCA), so that the extracted features are more useful for a prediction task of interest. Prior work on SPCA…
Missing data are often dealt with multiple imputation. A crucial part of the multiple imputation process is selecting sensible models to generate plausible values for incomplete data. A method based on posterior predictive checking is…
An improved mixture of probabilistic principal component analysis (PPCA) has been introduced for nonlinear data-driven process monitoring in this paper. To realize this purpose, the technique of a mixture of probabilistic principal…
We propose a new method to impute missing values in mixed datasets. It is based on a principal components method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical…
Principal component analysis (PCA) is arguably the most widely used approach for large-dimensional factor analysis. While it is effective when the factors are sufficiently strong, it can be inconsistent when the factors are weak and/or the…