Related papers: DPERC: Direct Parameter Estimation for Mixed Data

DPER: Efficient Parameter Estimation for Randomly Missing Data

The missing data problem has been broadly studied in the last few decades and has various applications in different areas such as statistics or bioinformatics. Even though many methods have been developed to tackle this challenge, most of…

Machine Learning · Statistics 2021-06-10 Thu Nguyen , Khoi Minh Nguyen-Duy , Duy Ho Minh Nguyen , Binh T. Nguyen , Bruce Alan Wade

Correlation visualization under missing values: a comparison between imputation and direct parameter estimation methods

Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the…

Machine Learning · Computer Science 2023-09-06 Nhat-Hao Pham , Khanh-Linh Vo , Mai Anh Vu , Thu Nguyen , Michael A. Riegler , Pål Halvorsen , Binh T. Nguyen

A Componentwise Estimation Procedure for Multivariate Location and Scatter: Robustness, Efficiency and Scalability

Covariance matrix estimation is an important problem in multivariate data analysis, both from theoretical as well as applied points of view. Many simple and popular covariance matrix estimators are known to be severely affected by model…

Methodology · Statistics 2025-11-21 Soumya Chakraborty , Ayanendranath Basu , Abhik Ghosh

Variational Inference of Dynamic Factor Models with Arbitrary Missing Data

Dynamic factor models are often estimated by point-estimation methods, disregarding parameter uncertainty. We propose a method accounting for parameter uncertainty by means of posterior approximation, using variational inference. Our…

Methodology · Statistics 2022-10-14 Erik Spånberg

A principal components method to impute missing values for mixed data

We propose a new method to impute missing values in mixed datasets. It is based on a principal components method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical…

Applications · Statistics 2013-02-20 Vincent Audigier , François Husson , Julie Josse

Direct covariance matrix estimation with compositional data

Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a…

Methodology · Statistics 2024-04-26 Aaron J. Molstad , Karl Oskar Ekvall , Piotr M. Suder

Recognizing Variables from their Data via Deep Embeddings of Distributions

A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…

Machine Learning · Computer Science 2019-09-12 Jonas Mueller , Alex Smola

Nonparametric Statistical Inference and Imputation for Incomplete Categorical Data

Missingness in categorical data is a common problem in various real applications. Traditional approaches either utilize only the complete observations or impute the missing data by some ad hoc methods rather than the true conditional…

Methodology · Statistics 2019-07-12 Chaojie Wang , Linghao Shen , Han Li , Xiaodan Fan

Machine Learning Based Missing Values Imputation in Categorical Datasets

In order to predict and fill in the gaps in categorical datasets, this research looked into the use of machine learning algorithms. The emphasis was on ensemble models constructed using the Error Correction Output Codes framework, including…

Machine Learning · Computer Science 2024-09-13 Muhammad Ishaq , Sana Zahir , Laila Iftikhar , Mohammad Farhad Bulbul , Seungmin Rho , Mi Young Lee

Minimax Rate-optimal Estimation of High-dimensional Covariance Matrices with Incomplete Data

Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the…

Methodology · Statistics 2016-05-17 T. Tony Cai , Anru Zhang

Estimation and model selection in generalized additive partial linear models for correlated data with diverging number of covariates

We propose generalized additive partial linear models for complex data which allow one to capture nonlinear patterns of some covariates, in the presence of linear components. The proposed method improves estimation efficiency and increases…

Statistics Theory · Mathematics 2014-05-26 Li Wang , Lan Xue , Annie Qu , Hua Liang

Multiple Imputation of Missing Categorical and Continuous Values via Bayesian Mixture Models with Local Dependence

We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures…

Applications · Statistics 2015-10-14 Jared S. Murray , Jerome P. Reiter

EPEM: Efficient Parameter Estimation for Multiple Class Monotone Missing Data

The problem of monotone missing data has been broadly studied during the last two decades and has many applications in different fields such as bioinformatics or statistics. Commonly used imputation techniques require multiple iterations…

Machine Learning · Computer Science 2020-09-25 Thu Nguyen , Duy H. M. Nguyen , Huy Nguyen , Binh T. Nguyen , Bruce A. Wade

Nonparametric Copula Models for Multivariate, Mixed, and Missing Data

Modern datasets commonly feature both substantial missingness and many variables of mixed data types, which present significant challenges for estimation and inference. Complete case analysis, which proceeds using only the observations with…

Methodology · Statistics 2023-04-10 Joseph Feldman , Daniel R. Kowal

HI-PMK: A Data-Dependent Kernel for Incomplete Heterogeneous Data Representation

Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical).…

Machine Learning · Computer Science 2025-07-30 Youran Zhou , Mohamed Reda Bouadjenek , Jonathan Wells , Sunil Aryal

Covariance Matrix Estimation for High-Throughput Biomedical Data with Interconnected Communities

Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, among others, consistently reveal strong…

Methodology · Statistics 2024-12-05 Yifan Yang , Chixiang Chen , Shuo Chen

Imputations for High Missing Rate Data in Covariates via Semi-supervised Learning Approach

Advancements in data collection techniques and the heterogeneity of data resources can yield high percentages of missing observations on variables, such as block-wise missing data. Under missing-data scenarios, traditional methods such as…

Methodology · Statistics 2022-05-17 Wei Lan , Xuerong Chen , Tao Zou , Chih-Ling Tsai

Improving Missing Data Imputation with Deep Generative Models

Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative…

Machine Learning · Computer Science 2019-02-28 Ramiro D. Camino , Christian A. Hammerschmidt , Radu State

Set-based differential covariance testing for high-throughput data

The problem of detecting changes in covariance for a single pair of features has been studied in some detail, but may be limited in importance or general applicability. In contrast, testing equality of covariance matrices of a {\it set} of…

Methodology · Statistics 2017-12-12 Yi-Hui Zhou

Permutation-Based Rank Test in the Presence of Discretization and Application in Causal Discovery with Mixed Data

Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical…

Machine Learning · Computer Science 2025-06-13 Xinshuai Dong , Ignavier Ng , Boyang Sun , Haoyue Dai , Guang-Yuan Hao , Shunxing Fan , Peter Spirtes , Yumou Qiu , Kun Zhang