Related papers: A PCA-based Data Prediction Method

Streaming PCA and Subspace Tracking: The Missing Data Case

For many modern applications in science and engineering, data are collected in a streaming fashion carrying time-varying information, and practitioners need to process them with a limited amount of memory and computational resources in a…

Machine Learning · Statistics 2018-06-13 Laura Balzano , Yuejie Chi , Yue M. Lu

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence of missing data. Missing data occurs when some values are not stored or observed for variables of interest. However, most of the statistical theory assumes that data…

Methodology · Statistics 2021-10-26 Luis Alejandro Masmela-Caita , Thais Paiva Galletti , Marcos Oliveira Prates

Improving the Accuracy of Principal Component Analysis by the Maximum Entropy Method

Classical Principal Component Analysis (PCA) approximates data in terms of projections on a small number of orthogonal vectors. There are simple procedures to efficiently compute various functions of the data from the PCA approximation. The…

Machine Learning · Statistics 2019-07-26 Guihong Wan , Crystal Maung , Haim Schweitzer

Spherical Principal Component Analysis

Principal Component Analysis (PCA) is one of the most important methods to handle high dimensional data. However, most of the studies on PCA aim to minimize the loss after projection, which usually measures the Euclidean distance, though in…

Machine Learning · Computer Science 2019-03-19 Kai Liu , Qiuwei Li , Hua Wang , Gongguo Tang

PCA-Based Missing Information Imputation for Real-Time Crash Likelihood Prediction Under Imbalanced Data

The real-time crash likelihood prediction has been an important research topic. Various classifiers, such as support vector machine (SVM) and tree-based boosting algorithms, have been proposed in traffic safety studies. However, few…

Machine Learning · Computer Science 2018-02-13 Jintao Ke , Shuaichao Zhang , Hai Yang , Xiqun Chen

Rough Sets Computations to Impute Missing Data

Many techniques for handling missing data have been proposed in the literature. Most of these techniques are overly complex. This paper explores an imputation technique based on rough set computations. In this paper, characteristic…

Computer Vision and Pattern Recognition · Computer Science 2007-05-23 Fulufhelo Vincent Nelwamondo , Tshilidzi Marwala

Probabilistic Predictive Principal Component Analysis for Spatially-Misaligned and High-Dimensional Air Pollution Data with Missing Observations

Accurate predictions of pollutant concentrations at new locations are often of interest in air pollution studies on fine particulate matters (PM$_{2.5}$), in which data is usually not measured at all study locations. PM$_{2.5}$ is also a…

Applications · Statistics 2020-05-19 Phuong T. Vu , Timothy V. Larson , Adam A. Szpiro

Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques

The estimation of missing input vector elements in real time processing applications requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space.…

Applications · Statistics 2007-05-23 Fulufhelo V. Nelwamondo , Shakir Mohamed , Tshilidzi Marwala

Estimation of Missing Data Using Computational Intelligence and Decision Trees

This paper introduces a novel paradigm to impute missing data that combines a decision tree with an auto-associative neural network (AANN) based model and a principal component analysis-neural network (PCA-NN) based model. For each model,…

Applications · Statistics 2007-09-12 George Ssali , Tshilidzi Marwala

Missing Data Imputation using Neural Cellular Automata

When working with tabular data, missingness is always one of the most painful problems. Throughout many years, researchers have continuously explored better and better ways to impute missing data. Recently, with the rapid development…

Machine Learning · Computer Science 2025-09-09 Tin Luu , Binh Nguyen , Man Ngo

Handling Missing Data in Decision Trees: A Probabilistic Approach

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine…

Machine Learning · Computer Science 2020-07-01 Pasha Khosravi , Antonio Vergari , YooJung Choi , Yitao Liang , Guy Van den Broeck

PROMISSING: Pruning Missing Values in Neural Networks

While data are the primary fuel for machine learning models, they often suffer from missing values, especially when collected in real-world scenarios. However, many off-the-shelf machine learning models, including artificial neural network…

Machine Learning · Computer Science 2022-06-06 Seyed Mostafa Kia , Nastaran Mohammadian Rad , Daniel van Opstal , Bart van Schie , Andre F. Marquand , Josien Pluim , Wiepke Cahn , Hugo G. Schnack

Explainable Data Imputation using Constraints

Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or…

Artificial Intelligence · Computer Science 2022-05-11 Sandeep Hans , Diptikalyan Saha , Aniya Aggarwal

Supervised Dynamic PCA: Linear Dynamic Forecasting with Many Predictors

This paper proposes a novel dynamic forecasting method using a new supervised Principal Component Analysis (PCA) when a large number of predictors are available. The new supervised PCA provides an effective way to bridge the gap between…

Econometrics · Economics 2024-06-14 Zhaoxing Gao , Ruey S. Tsay

Missing Data in Signal Processing and Machine Learning: Models, Methods and Modern Approaches

This tutorial aims to provide signal processing (SP) and machine learning (ML) practitioners with vital tools, in an accessible way, to answer the question: How to deal with missing data? There are many strategies to handle incomplete…

Signal Processing · Electrical Eng. & Systems 2026-01-06 Alexandre Hippert-Ferrer , Aude Sportisse , Amirhossein Javaheri , Mohammed Nabil El Korso , Daniel P. Palomar

Spatial Matrix Completion for Spatially-Misaligned and High-Dimensional Air Pollution Data

In health-pollution cohort studies, accurate predictions of pollutant concentrations at new locations are needed, since the locations of fixed monitoring sites and study participants are often spatially misaligned. For multi-pollution data,…

Applications · Statistics 2022-01-24 Phuong T. Vu , Adam A. Szpiro , Noah Simon

Unsupervised Metric Learning in Presence of Missing Data

For many machine learning tasks, the input data lie on a low-dimensional manifold embedded in a high dimensional space and, because of this high-dimensional structure, most algorithms are inefficient. The typical solution is to reduce the…

Machine Learning · Computer Science 2019-03-05 Anna C. Gilbert , Rishi Sonthalia

Multiple imputation for continuous variables using a Bayesian principal component analysis

We propose a multiple imputation method based on principal component analysis (PCA) to deal with incomplete continuous data. To reflect the uncertainty of the parameters from one imputation to the next, we use a Bayesian treatment of the…

Methodology · Statistics 2015-08-20 Vincent Audigier , François Husson , Julie Josse

Imputation of missing data using multivariate Gaussian Linear Cluster-Weighted Modeling

Missing data arises when certain values are not recorded or observed for variables of interest. However, most of the statistical theory assume complete data availability. To address incomplete databases, one approach is to fill the gaps…

Methodology · Statistics 2023-08-15 Luis Alejandro Masmela-Caita , Thais Paiva Galletti , Marcos Oliveira Prates

Principal Component Analysis based frameworks for efficient missing data imputation algorithms

Missing data is a commonly occurring problem in practice. Many imputation methods have been developed to fill in the missing entries. However, not all of them can scale to high-dimensional data, especially the multiple imputation…

Machine Learning · Computer Science 2023-03-21 Thu Nguyen , Hoang Thien Ly , Michael Alexander Riegler , Pål Halvorsen , Hugo L. Hammer