Related papers: Dimensional Data KNN-Based Imputation

Internal Data Imputation in Data Warehouse Dimensions

Missing values occur commonly in the multidimensional data warehouses. They may generate problems of usefulness of data since the analysis performed on a multidimensional data warehouse is through different dimensions with hierarchies where…

Databases · Computer Science 2021-10-05 Yuzhao Yang , Fatma Abdelhedi , Jérôme Darmont , Franck Ravat , Olivier Teste

Numerical Data Imputation for Multimodal Data Sets: A Probabilistic Nearest-Neighbor Kernel Density Approach

Numerical data imputation algorithms replace missing values by estimates to leverage incomplete data sets. Current imputation methods seek to minimize the error between the unobserved ground truth and the imputed values. But this strategy…

Machine Learning · Statistics 2023-07-11 Florian Lalande , Kenji Doya

Missing Data Imputation for Classification Problems

Imputation of missing data is a common application in various classification problems where the feature training matrix has missingness. A widely used solution to this imputation problem is based on the lazy learning technique, $k$-nearest…

Machine Learning · Statistics 2020-02-26 Arkopal Choudhury , Michael R. Kosorok

Nearest Neighbor Imputation for Categorical Data by Weighting of Attributes

Missing values are a common phenomenon in all areas of applied research. While various imputation methods are available for metrically scaled variables, methods for categorical data are scarce. An imputation method that has been shown to…

Methodology · Statistics 2017-10-04 Shahla Faisal , Gerhard Tutz

Multiple imputation using dimension reduction techniques for high-dimensional data

Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely…

Methodology · Statistics 2019-05-15 Domonique W. Hodge , Sandra E. Safo , Qi Long

DW-KNN: A Transparent Local Classifier Integrating Distance Consistency and Neighbor Reliability

K-Nearest Neighbors (KNN) is one of the most used ML classifiers. However, if we observe closely, standard distance-weighted KNN and relative variants assume all 'k' neighbors are equally reliable. In heterogeneous feature space, this…

Machine Learning · Computer Science 2025-12-11 Kumarjit Pathak , Karthik K , Sachin Madan , Jitin Kapila

Missing Value Imputation Based on Deep Generative Models

Missing values widely exist in many real-world datasets, which hinders the performing of advanced data analytics. Properly filling these missing values is crucial but challenging, especially when the missing rate is high. Many approaches…

Machine Learning · Computer Science 2018-08-07 Hongbao Zhang , Pengtao Xie , Eric Xing

Adaptively-weighted Nearest Neighbors for Matrix Completion

In this technical note, we introduce and analyze AWNN: an adaptively weighted nearest neighbor method for performing matrix completion. Nearest neighbor (NN) methods are widely used in missing data problems across multiple disciplines such…

Machine Learning · Statistics 2025-05-15 Tathagata Sadhukhan , Manit Paul , Raaz Dwivedi

Quality control, data cleaning, imputation

This chapter addresses important steps during the quality assurance and control of RWD, with particular emphasis on the identification and handling of missing values. A gentle introduction is provided on common statistical and machine…

Methodology · Statistics 2021-11-01 Dawei Liu , Hanne I. Oberman , Johanna Muñoz , Jeroen Hoogland , Thomas P. A. Debray

HMVI: Unifying Heterogeneous Attributes with Natural Neighbors for Missing Value Inference

Missing value imputation is a fundamental challenge in machine intelligence, heavily dependent on data completeness. Current imputation methods often handle numerical and categorical attributes independently, overlooking critical…

Machine Learning · Computer Science 2026-01-09 Xiaopeng Luo , Zexi Tan , Zhuowei Wang

Imputation of missing data using multivariate Gaussian Linear Cluster-Weighted Modeling

Missing data arises when certain values are not recorded or observed for variables of interest. However, most of the statistical theory assume complete data availability. To address incomplete databases, one approach is to fill the gaps…

Methodology · Statistics 2023-08-15 Luis Alejandro Masmela-Caita , Thais Paiva Galletti , Marcos Oliveira Prates

Learned k-NN Distance Estimation

Big data mining is well known to be an important task for data science, because it can provide useful observations and new knowledge hidden in given large datasets. Proximity-based data analysis is particularly utilized in many real-life…

Databases · Computer Science 2022-11-29 Daichi Amagata , Yusuke Arai , Sumio Fujita , Takahiro Hara

Imputation of Missing Data Using Linear Gaussian Cluster-Weighted Modeling

Missing data theory deals with the statistical methods in the occurrence of missing data. Missing data occurs when some values are not stored or observed for variables of interest. However, most of the statistical theory assumes that data…

Methodology · Statistics 2021-10-26 Luis Alejandro Masmela-Caita , Thais Paiva Galletti , Marcos Oliveira Prates

Evaluation of Missing Data Imputation for Time Series Without Ground Truth

The challenge of handling missing data in time series is critical for maintaining the accuracy and reliability of machine learning (ML) models in applications like fifth generation mobile communication (5G) network management. Traditional…

Machine Learning · Computer Science 2025-03-11 Rania Farjallah , Bassant Selim , Brigitte Jaumard , Samr Ali , Georges Kaddoum

MISNN: Multiple Imputation via Semi-parametric Neural Networks

Multiple imputation (MI) has been widely applied to missing value problems in biomedical, social and econometric research, in order to avoid improper inference in the downstream data analysis. In the presence of high-dimensional data,…

Methodology · Statistics 2023-05-04 Zhiqi Bu , Zongyu Dai , Yiliang Zhang , Qi Long

Multiple Imputation Method for High-Dimensional Neuroimaging Data

Missingness is a common issue for neuroimaging data, and neglecting it in downstream statistical analysis can introduce bias and lead to misguided inferential conclusions. It is therefore crucial to conduct appropriate statistical methods…

Methodology · Statistics 2025-03-25 Tong Lu , Chixiang Chen , Hsin-Hsiung Huang , Peter Kochunov , Elliot Hong , Shuo Chen

Explainable Data Imputation using Constraints

Data values in a dataset can be missing or anomalous due to mishandling or human error. Analysing data with missing values can create bias and affect the inferences. Several analysis methods, such as principle components analysis or…

Artificial Intelligence · Computer Science 2022-05-11 Sandeep Hans , Diptikalyan Saha , Aniya Aggarwal

On pattern classification with weighted dimensions

Studies on various facets of pattern classification is often imperative while working with multi-dimensional samples pertaining to diverse application scenarios. In this notion, weighted dimension-based distance measure has been one of the…

Machine Learning · Computer Science 2025-10-24 Ayatullah Faruk Mollah

Differentiable Weightless Neural Networks

We introduce the Differentiable Weightless Neural Network (DWN), a model based on interconnected lookup tables. Training of DWNs is enabled by a novel Extended Finite Difference technique for approximate differentiation of binary values. We…

Machine Learning · Computer Science 2025-03-04 Alan T. L. Bacellar , Zachary Susskind , Mauricio Breternitz , Eugene John , Lizy K. John , Priscila M. V. Lima , Felipe M. G. França

A Novel Approach for Imputation of Missing Attribute Values for Efficient Mining of Medical Datasets - Class Based Cluster Approach

Missing attribute values are quite common in the datasets available in the literature. Missing values are also possible because all attributes values may not be recorded and hence unavailable due to several practical reasons. For all these…

Information Retrieval · Computer Science 2016-05-04 Yelipe UshaRani , P. Sammulal