Related papers: Distributed nonparametric regression imputation fo…

Nonparametric modal regression with missing response observations

Modal regression has emerged as a flexible alternative to classical regression models when the conditional mean or median are unable to adequately capture the underlying relation between a response and a predictor variable. This approach is…

Methodology · Statistics 2025-04-08 Ana Pérez-González , Tomás R. Cotos-Yáñez , Rosa M. Crujeiras

Regression-based imputation of explanatory discrete missing data

Imputation of missing values is a strategy for handling non-responses in surveys or data loss in measurement processes, which may be more effective than ignoring them. When the variable represents a count, the literature dealing with this…

Applications · Statistics 2020-07-31 Gilma Hernández-Herrera , Albert Navarro , David Moriña

Nonparametric augmented probability weighting with sparsity

Nonresponse frequently arises in practice, and simply ignoring it may lead to erroneous inference. Besides, the number of collected covariates may increase as the sample size in modern statistics, so parametric imputation or propensity…

Methodology · Statistics 2022-09-29 Xin He , Xiaojun Mao , Zhonglei Wang

Semiparametric fractional imputation using Gaussian mixture models for handling multivariate missing data

Item nonresponse is frequently encountered in practice. Ignoring missing data can lose efficiency and lead to misleading inference. Fractional imputation is a frequentist approach of imputation for handling missing data. However, the…

Methodology · Statistics 2018-09-18 Hejian Sang , Jae Kwang Kim

Distributed inference for quantile regression processes

The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big…

Statistics Theory · Mathematics 2018-04-12 Stanislav Volgushev , Shih-Kang Chao , Guang Cheng

Online Asynchronous Distributed Regression

Distributed computing offers a high degree of flexibility to accommodate modern learning constraints and the ever increasing size of datasets involved in massive data issues. Drawing inspiration from the theory of distributed computation…

Statistics Theory · Mathematics 2014-07-17 Gérard Biau , Ryad Zenine

Statistical Inference after Kernel Ridge Regression Imputation under item nonresponse

Imputation is a popular technique for handling missing data. We consider a nonparametric approach to imputation using the kernel ridge regression technique and propose consistent variance estimation. The proposed variance estimator is based…

Methodology · Statistics 2021-02-02 Hengfang Wang , Jae-Kwang Kim

Nonparametric Distribution Regression Re-calibration

A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration,…

Machine Learning · Statistics 2026-02-17 Ádám Jung , Domokos M. Kelen , András A. Benczúr

Distribution Regression

Linear regression is a fundamental and popular statistical method. There are various kinds of linear regression, such as mean regression and quantile regression. In this paper, we propose a new one called distribution regression, which…

Methodology · Statistics 2017-12-27 Xin Chen , Xuejun Ma , Wang Zhou

Distributed Nonparametric Estimation under Communication Constraints

In the era of big data, it is necessary to split extremely large data sets across multiple computing nodes and construct estimators using the distributed data. When designing distributed estimators, it is desirable to minimize the amount of…

Statistics Theory · Mathematics 2022-04-25 Azeem Zaman , Botond Szabó

Imputation procedures in surveys using nonparametric and machine learning methods: an empirical comparison

Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse,…

Methodology · Statistics 2022-08-23 Mehdi Dagdoug , Camelia Goga , David Haziza

Influence of parallel computing strategies of iterative imputation of missing data: a case study on missForest

Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but they can be time-consuming when handling large datasets. To overcome this drawback, parallel computing strategies have been…

Applications · Statistics 2020-04-24 Shangzhi Hong , Yuqi Sun , Hanying Li , Henry S. Lynn

Distributed linear regression by averaging

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck.…

Statistics Theory · Mathematics 2022-10-25 Edgar Dobriban , Yue Sheng

Semiparametric Imputation Using Conditional Gaussian Mixture Models under Item Nonresponse

Imputation is a popular technique for handling item nonresponse in survey sampling. Parametric imputation is based on a parametric model for imputation and is less robust against the failure of the imputation model. Nonparametric imputation…

Methodology · Statistics 2019-09-20 Danhyang Lee , Jae Kwang Kim

Distributed adaptive estimation for stochastic large regression models

This paper studies the distributed adaptiveestimation problems for stochastic large regression modelswith an infinite number of parameters. By constructing a re-cursive local cost function, we propose a novel distributedrecursive least…

Systems and Control · Electrical Eng. & Systems 2026-04-29 Die Gan , Siyu Xie , Zhixin Liu , Xuebo Zhang

Distributed estimation of principal support vector machines for sufficient dimension reduction

The principal support vector machines method (Li et al., 2011) is a powerful tool for sufficient dimension reduction that replaces original predictors with their low-dimensional linear combinations without loss of information. However, the…

Machine Learning · Statistics 2019-12-02 Jun Jin , Chao Ying , Zhou Yu

Estimation in semiparametric spatial regression

Nonparametric methods have been very popular in the last couple of decades in time series and regression, but no such development has taken place for spatial models. A rather obvious reason for this is the curse of dimensionality. For…

Statistics Theory · Mathematics 2007-06-13 Jiti Gao , Zudi Lu , Dag Tjøstheim

Multiple imputation using dimension reduction techniques for high-dimensional data

Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely…

Methodology · Statistics 2019-05-15 Domonique W. Hodge , Sandra E. Safo , Qi Long

An Investigation of Methods for Handling Missing Data with Penalized Regression

We investigate methods for penalized regression in the presence of missing observations. This paper introduces a method for estimating the parameters which compensates for the missing observations. We first, derive an unbiased estimator of…

Applications · Statistics 2013-10-09 Yunjin Choi , Robert Tibshirani

On regression and classification with possibly missing response variables in the data

This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information is unknown and can depend on both predictors and…

Statistics Theory · Mathematics 2022-12-07 Majid Mojirsheibani , William Pouliot , Andre Shakhbandaryan