English
Related papers

Related papers: Dimensionality reduction with missing values imput…

200 papers

Missing values are a common problem in data science and machine learning. Removing instances with missing values can adversely affect the quality of further data analysis. This is exacerbated when there are relatively many more features…

Machine Learning · Computer Science 2023-01-03 Ekaterina Antonenko , Jesse Read

An analysis of high-dimensional data can offer a detailed description of a system but is often challenged by the curse of dimensionality. General dimensionality reduction techniques can alleviate such difficulty by extracting a few…

Methodology · Statistics 2021-09-28 Di Bo , Hoon Hwangbo , Vinit Sharma , Corey Arndt , Stephanie C. TerMaath

Standard approaches for variable selection in linear models are not tailored to deal properly with high-dimensional and incomplete data. Currently, methods dedicated to high-dimensional data handle missing values by ad-hoc strategies, like…

Methodology · Statistics 2021-06-09 Avner Bar-Hen , Vincent Audigier

Modern data acquisition based on high-throughput technology is often facing the problem of missing data. Algorithms commonly used in the analysis of such large-scale data often depend on a complete set. Missing value imputation offers a…

Applications · Statistics 2014-06-03 Daniel J. Stekhoven , Peter Bühlmann

This report concerns the problem of dimensionality reduction through information geometric methods on statistical manifolds. While there has been considerable work recently presented regarding dimensionality reduction for the purposes of…

Machine Learning · Statistics 2008-09-30 Kevin M. Carter , Raviv Raich , Alfred O. Hero

The real-life data have a complex and non-linear structure due to their nature. These non-linearities and the large number of features can usually cause problems such as the empty-space phenomenon and the well-known curse of dimensionality.…

Machine Learning · Computer Science 2025-03-13 Kadir Özçoban , Murat Manguoğlu , Emrullah Fatih Yetkin

Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely…

Methodology · Statistics 2019-05-15 Domonique W. Hodge , Sandra E. Safo , Qi Long

Dimension reduction plays a pivotal role in analysing high-dimensional data. However, observations with missing values present serious difficulties in directly applying standard dimension reduction techniques. As a large number of dimension…

Machine Learning · Statistics 2021-09-28 Yurong Ling , Zijing Liu , Jing-Hao Xue

In this paper we present the practical benefits of a new random forest algorithm to deal withmissing values in the sample. The purpose of this work is to compare the different solutionsto deal with missing values with random forests and…

Statistics Theory · Mathematics 2021-10-19 Irving Gómez-Méndez , Emilien Joly

This work proposes a non-iterative strategy for missing value imputations which is guided by similarity between observations, but instead of explicitly determining distances or nearest neighbors, it assigns observations to overlapping…

Machine Learning · Statistics 2019-11-25 David Cortes

Data for which a set of objects is described by multiple distinct feature sets (called views) is known as multi-view data. When missing values occur in multi-view data, all features in a view are likely to be missing simultaneously. This…

Machine Learning · Statistics 2024-06-21 Wouter van Loon , Marjolein Fokkema , Frank de Vos , Marisa Koini , Reinhold Schmidt , Mark de Rooij

Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative…

Machine Learning · Computer Science 2019-02-28 Ramiro D. Camino , Christian A. Hammerschmidt , Radu State

Imputation of missing attribute values in medical datasets for extracting hidden knowledge from medical datasets is an interesting research topic of interest which is very challenging. One cannot eliminate missing values in medical records.…

Databases · Computer Science 2016-03-11 Yelipe UshaRani , P. Sammulal

Randomized dimensionality reduction is a widely-used algorithmic technique for speeding up large-scale Euclidean optimization problems. In this paper, we study dimension reduction for a variety of maximization problems, including…

Data Structures and Algorithms · Computer Science 2025-06-03 Jie Gao , Rajesh Jayaram , Benedikt Kolbe , Shay Sapir , Chris Schwiegelshohn , Sandeep Silwal , Erik Waingarten

Random forest (RF) missing data algorithms are an attractive approach for dealing with missing data. They have the desirable properties of being able to handle mixed types of missing data, they are adaptive to interactions and nonlinearity,…

Machine Learning · Statistics 2017-01-23 Fei Tang , Hemant Ishwaran

Missing value imputation is an important practical problem. There is a large body of work on it, but there does not exist any work that formulates the problem in a structured output setting. Also, most applications have constraints on the…

Machine Learning · Computer Science 2013-11-12 Rahul Kidambi , Vinod Nair , Sundararajan Sellamanickam , S. Sathiya Keerthi

Decision trees are a popular family of models due to their attractive properties such as interpretability and ability to handle heterogeneous data. Concurrently, missing data is a prevalent occurrence that hinders performance of machine…

Machine Learning · Computer Science 2020-07-01 Pasha Khosravi , Antonio Vergari , YooJung Choi , Yitao Liang , Guy Van den Broeck

Feature selection is a critical step in high-dimensional classification tasks, particularly under challenging conditions of double imbalance, namely settings characterized by both class imbalance in the response variable and dimensional…

Methodology · Statistics 2025-06-13 Fabio Demaria

Large high-dimensional datasets are becoming more and more popular in an increasing number of research areas. Processing the high dimensional data incurs a high computational cost and is inherently inefficient since many of the values that…

Computer Vision and Pattern Recognition · Computer Science 2013-05-01 Alon Schclar

Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such…

Methodology · Statistics 2021-03-25 Joshua Daniel Loyal , Ruoqing Zhu , Yifan Cui , Xin Zhang
‹ Prev 1 2 3 10 Next ›