Related papers: Missing Data using Decision Forest and Computation…

Missing Data: A Comparison of Neural Network and Expectation Maximisation Techniques

The estimation of missing input vector elements in real time processing applications requires a system that possesses the knowledge of certain characteristics such as correlations between variables, which are inherent in the input space.…

Applications · Statistics 2007-05-23 Fulufhelo V. Nelwamondo , Shakir Mohamed , Tshilidzi Marwala

Handling Missing Data in Decision Trees: A Probabilistic Approach

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine…

Machine Learning · Computer Science 2020-07-01 Pasha Khosravi , Antonio Vergari , YooJung Choi , Yitao Liang , Guy Van den Broeck

Random Forest Missing Data Algorithms

Random forest (RF) missing data algorithms are an attractive approach for dealing with missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity,…

Machine Learning · Statistics 2017-01-23 Fei Tang , Hemant Ishwaran

Estimation of Missing Data Using Computational Intelligence and Decision Trees

This paper introduces a novel paradigm to impute missing data that combines a decision tree with an auto-associative neural network (AANN) based model and a principal component analysis-neural network (PCA-NN) based model. For each model,…

Applications · Statistics 2007-09-12 George Ssali , Tshilidzi Marwala

Missing Data Prediction and Classification: The Use of Auto-Associative Neural Networks and Optimization Algorithms

This paper presents methods which are aimed at finding approximations to missing data in a dataset by using optimization algorithms to optimize the network parameters after which prediction and classification tasks can be performed. The…

Neural and Evolutionary Computing · Computer Science 2014-03-24 Collins Leke , Bhekisipho Twala , T. Marwala

Handling missing data in a neural network approach for the identification of charged particles in a multilayer detector

Identification of charged particles in a multilayer detector by the energy loss technique may also be achieved by the use of a neural network. The performance of the network becomes worse when a large fraction of information is missing, for…

Methodology · Statistics 2020-04-14 S. Riggi , D. Riggi , F. Riggi

Evaluating the Impact of Missing Data Imputation through the use of the Random Forest Algorithm

This paper presents an impact assessment for the imputation of missing data. The data set used is HIV Seroprevalence data from an antenatal clinic study survey performed in 2001. Data imputation is performed through five methods: Random…

Methodology · Statistics 2020-11-25 Adam Pantanowitz , Tshilidzi Marwala

Missing Data Estimation in High-Dimensional Datasets: A Swarm Intelligence-Deep Neural Network Approach

In this paper, we examine the problem of missing data in high-dimensional datasets by taking into consideration the Missing Completely at Random and Missing at Random mechanisms, as well as theArbitrary missing pattern. Additionally, this…

Artificial Intelligence · Computer Science 2016-07-04 Collins Leke , Tshilidzi Marwala

Who wins the Miss Contest for Imputation Methods? Our Vote for Miss BooPF

Missing data is an expected issue when large amounts of data is collected, and several imputation techniques have been proposed to tackle this problem. Beneath classical approaches such as MICE, the application of Machine Learning…

Machine Learning · Statistics 2017-12-01 Burim Ramosaj , Markus Pauly

BEST : A decision tree algorithm that handles missing values

The main contribution of this paper is the development of a new decision tree algorithm. The proposed approach allows users to guide the algorithm through the data partitioning process. We believe this feature has many applications but in…

Machine Learning · Statistics 2020-10-27 Cédric Beaulac , Jeffrey S. Rosenthal

Regression with Missing Data, a Comparison Study of TechniquesBased on Random Forests

In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and…

Statistics Theory · Mathematics 2021-10-19 Irving Gómez-Méndez , Emilien Joly

Proposition of a Theoretical Model for Missing Data Imputation using Deep Learning and Evolutionary Algorithms

In the last couple of decades, there has been major advancements in the domain of missing data imputation. The techniques in the domain include amongst others: Expectation Maximization, Neural Networks with Evolutionary Algorithms or…

Neural and Evolutionary Computing · Computer Science 2015-12-07 Collins Leke , Tshilidzi Marwala , Satyakama Paul

Missing Data Imputation using Optimal Transport

Missing data is a crucial issue when applying machine learning algorithms to real-world datasets. Starting from the simple assumption that two batches extracted randomly from the same dataset should share the same distribution, we leverage…

Machine Learning · Statistics 2020-07-02 Boris Muzellec , Julie Josse , Claire Boyer , Marco Cuturi

Processing of missing data by neural networks

We propose a general, theoretically justified mechanism for processing missing data by neural networks. Our idea is to replace typical neuron's response in the first hidden layer by its expected value. This approach can be applied for…

Machine Learning · Computer Science 2019-04-05 Marek Smieja , Łukasz Struski , Jacek Tabor , Bartosz Zieliński , Przemysław Spurek

Prediction with Missing Data via Bayesian Additive Regression Trees

We present a method for incorporating missing data in non-parametric statistical learning without the need for imputation. We focus on a tree-based method, Bayesian Additive Regression Trees (BART), enhanced with "Missingness Incorporated…

Machine Learning · Statistics 2014-02-14 Adam Kapelner , Justin Bleich

DPER: Efficient Parameter Estimation for Randomly Missing Data

The missing data problem has been broadly studied in the last few decades and has various applications in different areas such as statistics or bioinformatics. Even though many methods have been developed to tackle this challenge, most of…

Machine Learning · Statistics 2021-06-10 Thu Nguyen , Khoi Minh Nguyen-Duy , Duy Ho Minh Nguyen , Binh T. Nguyen , Bruce Alan Wade

Real-Time Network Traffic Forecasting with Missing Data: A Generative Model Approach

Real-time network traffic forecasting is crucial for network management and early resource allocation. Existing network traffic forecasting approaches operate under the assumption that the network traffic data is fully observed. However, in…

Networking and Internet Architecture · Computer Science 2025-06-12 Lei Deng , Wenhan Xu , Jingwei Li , Danny H. K. Tsang

Improving Missing Data Imputation with Deep Generative Models

Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative…

Machine Learning · Computer Science 2019-02-28 Ramiro D. Camino , Christian A. Hammerschmidt , Radu State

Achieving Reliable Causal Inference with Data-Mined Variables: A Random Forest Approach to the Measurement Error Problem

Combining machine learning with econometric analysis is becoming increasingly prevalent in both research and practice. A common empirical strategy involves the application of predictive modeling techniques to 'mine' variables of interest…

Econometrics · Economics 2020-12-22 Mochen Yang , Edward McFowland , Gordon Burtch , Gediminas Adomavicius

Estimation and imputation of missing data in longitudinal models with Zero-Inflated Poisson response variable

This research deals with the estimation and imputation of missing data in longitudinal models with a Poisson response variable inflated with zeros. A methodology is proposed that is based on the use of maximum likelihood, assuming that data…

Methodology · Statistics 2024-09-18 D. S. Martinez-Lobo , O. O. Melo , N. A. Cruz