Related papers: Distributed nonparametric regression imputation fo…
Modal regression has emerged as a flexible alternative to classical regression models when the conditional mean or median are unable to adequately capture the underlying relation between a response and a predictor variable. This approach is…
Imputation of missing values is a strategy for handling non-responses in surveys or data loss in measurement processes, which may be more effective than ignoring them. When the variable represents a count, the literature dealing with this…
Nonresponse frequently arises in practice, and simply ignoring it may lead to erroneous inference. Besides, the number of collected covariates may increase as the sample size in modern statistics, so parametric imputation or propensity…
Item nonresponse is frequently encountered in practice. Ignoring missing data can lose efficiency and lead to misleading inference. Fractional imputation is a frequentist approach of imputation for handling missing data. However, the…
The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big…
Distributed computing offers a high degree of flexibility to accommodate modern learning constraints and the ever increasing size of datasets involved in massive data issues. Drawing inspiration from the theory of distributed computation…
Imputation is a popular technique for handling missing data. We consider a nonparametric approach to imputation using the kernel ridge regression technique and propose consistent variance estimation. The proposed variance estimator is based…
A key challenge in probabilistic regression is ensuring that predictive distributions accurately reflect true empirical uncertainty. Minimizing overall prediction error often encourages models to prioritize informativeness over calibration,…
Linear regression is a fundamental and popular statistical method. There are various kinds of linear regression, such as mean regression and quantile regression. In this paper, we propose a new one called distribution regression, which…
In the era of big data, it is necessary to split extremely large data sets across multiple computing nodes and construct estimators using the distributed data. When designing distributed estimators, it is desirable to minimize the amount of…
Nonparametric and machine learning methods are flexible methods for obtaining accurate predictions. Nowadays, data sets with a large number of predictors and complex structures are fairly common. In the presence of item nonresponse,…
Machine learning iterative imputation methods have been well accepted by researchers for imputing missing data, but they can be time-consuming when handling large datasets. To overcome this drawback, parallel computing strategies have been…
Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck.…
Imputation is a popular technique for handling item nonresponse in survey sampling. Parametric imputation is based on a parametric model for imputation and is less robust against the failure of the imputation model. Nonparametric imputation…
This paper studies the distributed adaptiveestimation problems for stochastic large regression modelswith an infinite number of parameters. By constructing a re-cursive local cost function, we propose a novel distributedrecursive least…
The principal support vector machines method (Li et al., 2011) is a powerful tool for sufficient dimension reduction that replaces original predictors with their low-dimensional linear combinations without loss of information. However, the…
Nonparametric methods have been very popular in the last couple of decades in time series and regression, but no such development has taken place for spatial models. A rather obvious reason for this is the curse of dimensionality. For…
Missing data present challenges in data analysis. Naive analyses such as complete-case and available-case analysis may introduce bias and loss of efficiency, and produce unreliable results. Multiple imputation (MI) is one of the most widely…
We investigate methods for penalized regression in the presence of missing observations. This paper introduces a method for estimating the parameters which compensates for the missing observations. We first, derive an unbiased estimator of…
This paper considers the problem of kernel regression and classification with possibly unobservable response variables in the data, where the mechanism that causes the absence of information is unknown and can depend on both predictors and…