Related papers: Selective Inference with Distributed Data

Distributed Simultaneous Inference in Generalized Linear Models via Confidence Distribution

We propose a distributed method for simultaneous inference for datasets with sample size much larger than the number of covariates, i.e., N >> p, in the generalized linear models framework. When such datasets are too big to be analyzed…

Methodology · Statistics 2020-07-23 Lu Tang , Ling Zhou , Peter X. -K. Song

Distributed Sparse Linear Regression under Communication Constraints

In multiple domains, statistical tasks are performed in distributed settings, with data split among several end machines that are connected to a fusion center. In various applications, the end machines have limited bandwidth and power, and…

Machine Learning · Computer Science 2026-01-05 Rodney Fonseca , Boaz Nadler

Splitting strategies for post-selection inference

We consider the problem of providing valid inference for a selected parameter in a sparse regression setting. It is well known that classical regression tools can be unreliable in this context due to the bias generated in the selection…

Methodology · Statistics 2022-12-07 Daniel G. Rasines , G. Alastair Young

Selective Inference for Group-Sparse Linear Models

We develop tools for selective inference in the setting of group sparsity, including the construction of confidence intervals and p-values for testing selected groups of variables. Our main technical result gives the precise distribution of…

Methodology · Statistics 2016-07-28 Fan Yang , Rina Foygel Barber , Prateek Jain , John Lafferty

Efficient Distributed Learning with Sparsity

We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…

Machine Learning · Statistics 2016-05-26 Jialei Wang , Mladen Kolar , Nathan Srebro , Tong Zhang

Distributed Variational Inference in Sparse Gaussian Process Regression and Latent Variable Models

Gaussian processes (GPs) are a powerful tool for probabilistic inference over functions. They have been applied to both regression and non-linear dimensionality reduction, and offer desirable properties such as uncertainty estimates,…

Machine Learning · Statistics 2014-10-01 Yarin Gal , Mark van der Wilk , Carl E. Rasmussen

Median Selection Subset Aggregation for Parallel Inference

For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems…

Machine Learning · Statistics 2014-10-27 Xiangyu Wang , Peichao Peng , David Dunson

Post-selection inference in generalized linear models via parametric programming

We propose a unified framework to draw inferences for regression coefficients in a generalized linear model (GLM) following Lasso-based variable selection. We adapt to non-Gaussian GLMs a recently developed parametric programming strategy…

Methodology · Statistics 2026-03-27 Qinyan Shen , Karl Gregory , Xianzheng Huang

Communication-efficient Distributed Sparse Linear Discriminant Analysis

We propose a communication-efficient distributed estimation method for sparse linear discriminant analysis (LDA) in the high dimensional regime. Our method distributes the data of size $N$ into $m$ machines, and estimates a local sparse LDA…

Machine Learning · Statistics 2016-10-18 Lu Tian , Quanquan Gu

Distributed Sparse Feature Selection in Communication-Restricted Networks

This paper aims to propose and theoretically analyze a new distributed scheme for sparse linear regression and feature selection. The primary goal is to learn the few causal features of a high-dimensional dataset based on noisy observations…

Machine Learning · Statistics 2021-11-05 Hanie Barghi , Amir Najafi , Seyed Abolfazl Motahari

High-Dimensional Distributed Sparse Classification with Scalable Communication-Efficient Global Updates

As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer…

Machine Learning · Computer Science 2024-07-10 Fred Lu , Ryan R. Curtin , Edward Raff , Francis Ferraro , James Holt

Distributed linear regression by averaging

Distributed statistical learning problems arise commonly when dealing with large datasets. In this setup, datasets are partitioned over machines, which compute locally, and communicate short messages. Communication is often the bottleneck.…

Statistics Theory · Mathematics 2022-10-25 Edgar Dobriban , Yue Sheng

Methods of Selective Inference for Linear Mixed Models: a Review and Empirical Comparison

Selective inference aims at providing valid inference after a data-driven selection of models or hypotheses. It is essential to avoid overconfident results and replicability issues. While significant advances have been made in this area for…

Methodology · Statistics 2025-03-14 Matteo D'Alessandro , Magne Thoresen

Distributed inference for quantile regression processes

The increased availability of massive data sets provides a unique opportunity to discover subtle patterns in their distributions, but also imposes overwhelming computational challenges. To fully utilize the information contained in big…

Statistics Theory · Mathematics 2018-04-12 Stanislav Volgushev , Shih-Kang Chao , Guang Cheng

Sparse Data-Driven Random Projection in Regression for High-Dimensional Data

We examine the linear regression problem in a challenging high-dimensional setting with correlated predictors where the vector of coefficients can vary from sparse to dense. In this setting, we propose a combination of probabilistic…

Methodology · Statistics 2025-05-13 Roman Parzer , Peter Filzmoser , Laura Vana-Gür

SIGLE: a valid procedure for Selective Inference with the Generalized Linear Lasso

This article investigates uncertainty quantification of the generalized linear lasso~(GLL), a popular variable selection method in high-dimensional regression settings. In many fields of study, researchers use data-driven methods to select…

Statistics Theory · Mathematics 2023-07-11 Quentin Duchemin , Yohann de Castro

Robust and Sparse Regression in GLM by Stochastic Optimization

The generalized linear model (GLM) plays a key role in regression analyses. In high-dimensional data, the sparse GLM has been used but it is not robust against outliers. Recently, the robust methods have been proposed for the specific…

Machine Learning · Statistics 2026-05-15 Takayuki Kawashima , Hironori Fujisawa

Repro Samples Method for Model-Free Inference in High-Dimensional Binary Classification

This paper presents a novel method for statistical inference in high-dimensional binary models with unspecified structure, where we leverage a (potentially misspecified) sparsity-constrained working generalized linear model (GLM) to…

Methodology · Statistics 2025-10-03 Xiaotian Hou , Peng Wang , Minge Xie , Linjun Zhang

Communication-efficient sparse regression: a one-shot approach

We devise a one-shot approach to distributed sparse regression in the high-dimensional setting. The key idea is to average "debiased" or "desparsified" lasso estimators. We show the approach converges at the same rate as the lasso as long…

Machine Learning · Statistics 2015-08-12 Jason D. Lee , Yuekai Sun , Qiang Liu , Jonathan E. Taylor

Debiased distributed learning for sparse partial linear models in high dimensions

Although various distributed machine learning schemes have been proposed recently for pure linear models and fully nonparametric models, little attention has been paid on distributed optimization for semi-paramemetric models with…

Machine Learning · Statistics 2019-11-05 Shaogao Lv , Heng Lian