Related papers: Generalized Linear Models for Aggregated Data
The assumption that response and predictor belong to the same statistical unit may be violated in practice. Unbiased estimation and recovery of true label ordering based on unlabeled data are challenging tasks and have attracted increasing…
In medical research, individual-level patient data provide invaluable information, but the patients' right to confidentiality remains of utmost priority. This poses a huge challenge when estimating statistical models such as linear mixed…
In many contexts, we have access to aggregate data, but individual level data is unavailable. For example, medical studies sometimes report only aggregate statistics about disease prevalence because of privacy concerns. Even so, many a time…
Data privacy has increasingly become a daunting challenge because it limits data availability, which is essential in estimating statistical models such as generalized linear mixed models. Access to personal data often involves considerable…
Inference for the parameters indexing generalised linear models is routinely based on the assumption that the model is correct and a priori specified. This is unsatisfactory because the chosen model is usually the result of a data-adaptive…
Finite mixture distributions arise in sampling a heterogeneous population. Data drawn from such a population will exhibit extra variability relative to any single subpopulation. Statistical models based on finite mixtures can assist in the…
A connection between the General Linear Model (GLM) in combination with classical statistical inference and the machine learning (MLE)-based inference is described in this paper. Firstly, the estimation of the GLM parameters is expressed as…
This paper considers generalized linear models using rule-based features, also referred to as rule ensembles, for regression and probabilistic classification. Rules facilitate model interpretation while also capturing nonlinear dependences…
Much traditional statistical modelling assumes that the outcome variables of interest are independent of each other when conditioned on the explanatory variables. This assumption is strongly violated in the case of infectious diseases,…
We consider high-dimensional generalized linear models when the covariates are contaminated by measurement error. Estimates from errors-in-variables regression models are well-known to be biased in traditional low-dimensional settings if…
Biased sampling designs can be highly efficient when studying rare (binary) or low variability (continuous) endpoints. We consider longitudinal data settings in which the probability of being sampled depends on a repeatedly measured…
One of the major open problems in machine learning is to characterize generalization in the overparameterized regime, where most traditional generalization bounds become inconsistent even for overparameterized linear regression. In many…
Regression is typically treated as a curve-fitting process where the goal is to fit a prediction function to data. With the help of conditional generative adversarial networks, we propose to solve this age-old problem in a different way; we…
Two popular approaches for relating correlated measurements of a non-Gaussian response variable to a set of predictors are to fit a marginal model using generalized estimating equations and to fit a generalized linear mixed model by…
Generalized linear models (GLMs) have been used quite effectively in the modeling of a mean response under nonstandard conditions, where discrete as well as continuous data distributions can be accommodated. The choice of design for a GLM…
Recent research has shown growing interest in modeling hypergraphs, which capture polyadic interactions among entities beyond traditional dyadic relations. However, most existing methodologies for hypergraphs face significant limitations,…
Many algorithms and applications involve repeatedly solving variations of the same inference problem; for example we may want to introduce new evidence to the model or perform updates to conditional dependencies. The goal of adaptive…
In this manuscript we consider the problem of generalized linear estimation on Gaussian mixture data with labels given by a single-index model. Our first result is a sharp asymptotic expression for the test and training errors in the…
In many applied sciences a popular analysis strategy for high-dimensional data is to fit many multivariate generalized linear models in parallel. This paper presents a novel approach to address the resulting multiple testing problem by…
We consider an additive partially linear framework for modelling massive heterogeneous data. The major goal is to extract multiple common features simultaneously across all sub-populations while exploring heterogeneity of each…