Related papers: DPERC: Direct Parameter Estimation for Mixed Data
The missing data problem has been broadly studied in the last few decades and has various applications in different areas such as statistics or bioinformatics. Even though many methods have been developed to tackle this challenge, most of…
Correlation matrix visualization is essential for understanding the relationships between variables in a dataset, but missing data can pose a significant challenge in estimating correlation coefficients. In this paper, we compare the…
Covariance matrix estimation is an important problem in multivariate data analysis, both from theoretical as well as applied points of view. Many simple and popular covariance matrix estimators are known to be severely affected by model…
Dynamic factor models are often estimated by point-estimation methods, disregarding parameter uncertainty. We propose a method accounting for parameter uncertainty by means of posterior approximation, using variational inference. Our…
We propose a new method to impute missing values in mixed datasets. It is based on a principal components method, the factorial analysis for mixed data, which balances the influence of all the variables that are continuous and categorical…
Compositional data arise in many areas of research in the natural and biomedical sciences. One prominent example is in the study of the human gut microbiome, where one can measure the relative abundance of many distinct microorganisms in a…
A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…
Missingness in categorical data is a common problem in various real applications. Traditional approaches either utilize only the complete observations or impute the missing data by some ad hoc methods rather than the true conditional…
In order to predict and fill in the gaps in categorical datasets, this research looked into the use of machine learning algorithms. The emphasis was on ensemble models constructed using the Error Correction Output Codes framework, including…
Missing data occur frequently in a wide range of applications. In this paper, we consider estimation of high-dimensional covariance matrices in the presence of missing observations under a general missing completely at random model in the…
We propose generalized additive partial linear models for complex data which allow one to capture nonlinear patterns of some covariates, in the presence of linear components. The proposed method improves estimation efficiency and increases…
We present a nonparametric Bayesian joint model for multivariate continuous and categorical variables, with the intention of developing a flexible engine for multiple imputation of missing values. The model fuses Dirichlet process mixtures…
The problem of monotone missing data has been broadly studied during the last two decades and has many applications in different fields such as bioinformatics or statistics. Commonly used imputation techniques require multiple iterations…
Modern datasets commonly feature both substantial missingness and many variables of mixed data types, which present significant challenges for estimation and inference. Complete case analysis, which proceeds using only the observations with…
Handling incomplete and heterogeneous data remains a central challenge in real-world machine learning, where missing values may follow complex mechanisms (MCAR, MAR, MNAR) and features can be of mixed types (numerical and categorical).…
Estimating a covariance matrix is central to high-dimensional data analysis. Empirical analyses of high-dimensional biomedical data, including genomics, proteomics, microbiome, and neuroimaging, among others, consistently reveal strong…
Advancements in data collection techniques and the heterogeneity of data resources can yield high percentages of missing observations on variables, such as block-wise missing data. Under missing-data scenarios, traditional methods such as…
Datasets with missing values are very common on industry applications, and they can have a negative impact on machine learning models. Recent studies introduced solutions to the problem of imputing missing values based on deep generative…
The problem of detecting changes in covariance for a single pair of features has been studied in some detail, but may be limited in importance or general applicability. In contrast, testing equality of covariance matrices of a {\it set} of…
Recent advances have shown that statistical tests for the rank of cross-covariance matrices play an important role in causal discovery. These rank tests include partial correlation tests as special cases and provide further graphical…