Related papers: Maximum sampled conditional likelihood for informa…
To tackle massive data, subsampling is a practical approach to select the more informative data points. However, when responses are expensive to measure, developing efficient subsampling schemes is challenging, and an optimal sampling…
Subsampling techniques can reduce the computational costs of processing big data. Practical subsampling plans typically involve initial uniform sampling and refined sampling. With a subsample, big data inferences are generally built on the…
With the growing availability of large-scale biomedical data, it is often time-consuming or infeasible to directly perform traditional statistical analysis with relatively limited computing resources at hand. We propose a fast subsampling…
Downsampling or under-sampling is a technique that is utilized in the context of large and highly imbalanced classification models. We study optimal downsampling for imbalanced classification using generalized linear models (GLMs). We…
The traditional maximum likelihood estimator (MLE) is often of limited use in complex high-dimensional data due to the intractability of the underlying likelihood function. Maximum composite likelihood estimation (McLE) avoids full…
A significant hurdle for analyzing large sample data is the lack of effective statistical computing and inference methods. An emerging powerful approach for analyzing large sample data is subsampling, by which one takes a random subsample…
We investigate optimal subsampling for quantile regression. We derive the asymptotic distribution of a general subsampling estimator and then derive two versions of optimal subsampling probabilities. One version minimizes the trace of the…
Subsampling is an efficient method to deal with massive data. In this paper, we investigate the optimal subsampling for linear quantile regression when the covariates are functions. The asymptotic distribution of the subsampling estimator…
We investigate the issue of parameter estimation with nonuniform negative sampling for imbalanced data. We first prove that, with imbalanced data, the available information about unknown parameters is only tied to the relatively small…
In making inference on the relation between failure and exposure histories in the Cox semiparametric model, the maximum partial likelihood estimator (MPLE) of the finite dimensional odds parameter, and the Breslow estimator of the baseline…
The maximum likelihood estimation is computationally demanding for large datasets, particularly when the likelihood function includes integrals. Subsampling can reduce the computational burden, but it often results in efficiency loss.This…
Knowing the link between observed predictive variables and outcomes is crucial for making inference in any regression model. When this link is missing, partially or completely, classical estimation methods fail in recovering the true…
Subsampling is a computationally efficient and scalable method to draw inference in large data settings based on a subset of the data rather than needing to consider the whole dataset. When employing subsampling techniques, a crucial…
Computational capability often falls short when confronted with massive data, posing a common challenge in establishing a statistical model or statistical inference method dealing with big data. While subsampling techniques have been…
We propose a general approach to construct weighted likelihood estimating equations with the aim of obtaining robust parameter estimates. We modify the standard likelihood equations by incorporating a weight that reflects the statistical…
The asymptotic variance of the maximum likelihood estimate is proved to decrease when the maximization is restricted to a subspace that contains the true parameter value. Maximum likelihood estimation allows a systematic fitting of…
Maximum pseudo-likelihood (MPL) is a semiparametric estimation method often used to obtain the dependence parameters in copula models from data. It has been shown that despite being consistent, and in some cases efficient, MPL estimation…
Nonuniform subsampling methods are effective to reduce computational burden and maintain estimation efficiency for massive data. Existing methods mostly focus on subsampling with replacement due to its high computational efficiency. If the…
Distributional regression aims to find the best candidate in a given parametric family of conditional distributions to model a given dataset. As each candidate in the distribution family can be identified by the corresponding distribution…
Given p independent normal populations, we consider the problem of estimating the mean of those populations, that based on the observed data, give the strongest signals. We explicitly condition on the ranking of the sample means, and…