Related papers: Robust and Parallel Bayesian Model Selection

Distributed Computation for Marginal Likelihood based Model Choice

We propose a general method for distributed Bayesian model choice, using the marginal likelihood, where a data set is split in non-overlapping subsets. These subsets are only accessed locally by individual workers and no data is shared…

Computation · Statistics 2022-10-18 Alexander Buchholz , Daniel Ahfock , Sylvia Richardson

A Divide and Conquer Strategy for High Dimensional Bayesian Factor Models

We propose a distributed computing framework, based on a divide and conquer strategy and hierarchical modeling, to accelerate posterior inference for high-dimensional Bayesian factor models. Our approach distributes the task of…

Methodology · Statistics 2016-12-30 Gautam Sabnis , Debdeep Pati , Barbara Engelhardt , Natesh Pillai

Robust Multi-Model Subset Selection

Outlying observations can be challenging to handle and adversely affect subsequent analyses, especially in data with increasing dimensional complexity. Although outliers are not always undesired anomalies in the data and may possess…

Methodology · Statistics 2025-09-18 Anthony-Alexander Christidis , Gabriela Cohen-Freue

Distributionally Robust Feature Selection

We study the problem of selecting limited features to observe such that models trained on them can perform well simultaneously across multiple subpopulations. This problem has applications in settings where collecting each feature is…

Machine Learning · Computer Science 2025-10-27 Maitreyi Swaroop , Tamar Krishnamurti , Bryan Wilder

Sample-Efficient "Clustering and Conquer" Procedures for Parallel Large-Scale Ranking and Selection

This work aims to improve the sample efficiency of parallel large-scale ranking and selection (R&S) problems by leveraging correlation information. We modify the commonly used "divide and conquer" framework in parallel computing by adding a…

Methodology · Statistics 2026-02-16 Zishi Zhang , Yijie Peng

Divide-and-Conquer Bayesian Inference in Hidden Markov Models

Divide-and-conquer Bayesian methods consist of three steps: dividing the data into smaller computationally manageable subsets, running a sampling algorithm in parallel on all the subsets, and combining parameter draws from all the subsets.…

Methodology · Statistics 2021-06-01 Chunlei Wang , Sanvesh Srivastava

Layered Sampling for Robust Optimization Problems

In real world, our datasets often contain outliers. Moreover, the outliers can seriously affect the final machine learning result. Most existing algorithms for handling outliers take high time complexities (e.g. quadratic or cubic…

Computational Geometry · Computer Science 2020-02-28 Hu Ding , Zixiu Wang

Hierarchical Bayesian data selection

There are many issues that can cause problems when attempting to infer model parameters from data. Data and models are both imperfect, and as such there are multiple scenarios in which standard methods of inference will lead to misleading…

Computation · Statistics 2024-05-01 Simon L. Cotter

Robust and Scalable Bayes via a Median of Subset Posterior Measures

We propose a novel approach to Bayesian analysis that is provably robust to outliers in the data and often has computational advantages over standard methods. Our technique is based on splitting the data into non-overlapping subgroups,…

Statistics Theory · Mathematics 2016-06-03 Stanislav Minsker , Sanvesh Srivastava , Lizhen Lin , David B. Dunson

Median Selection Subset Aggregation for Parallel Inference

For massive data sets, efficient computation commonly relies on distributed algorithms that store and process subsets of the data on different machines, minimizing communication costs. Our focus is on regression and classification problems…

Machine Learning · Statistics 2014-10-27 Xiangyu Wang , Peichao Peng , David Dunson

Distributed sequential method for analyzing massive data

To analyse a very large data set containing lengthy variables, we adopt a sequential estimation idea and propose a parallel divide-and-conquer method. We conduct several conventional sequential estimation procedures separately, and properly…

Methodology · Statistics 2018-12-27 Zhanfeng Wang , Yuan-chin Ivan Chang

Robust subset selection

The best subset selection (or "best subsets") estimator is a classic tool for sparse regression, and developments in mathematical optimization over the past decade have made it more computationally tractable than ever. Notwithstanding its…

Methodology · Statistics 2022-01-11 Ryan Thompson

Robust and Explainable Divide-and-Conquer Learning for Intrusion Detection

Machine learning-based intrusion detection requires complex models to capture patterns in high-dimensional, noisy, and class-imbalanced raw network traffic, yet deploying such models remains impractical on resource-constrained devices with…

Machine Learning · Computer Science 2026-05-05 Yan Zhou , Kevin Hamlen , Michael De Lucia , Murat Kantarcioglu , Latifur Khan , Sharad Mehrotra , Ananthram Swami , Bhavani Thuraisingham

Scalable Bayesian inference for time series via divide-and-conquer

Bayesian computational algorithms tend to scale poorly as data size increases. This has motivated divide-and-conquer-based approaches for scalable inference. These divide the data into subsets, perform inference for each subset in parallel,…

Methodology · Statistics 2025-10-22 Rihui Ou , Lachlan Astfalck , Deborshee Sen , David Dunson

A Robust Regression Approach for Robot Model Learning

Machine learning and data analysis have been used in many robotics fields, especially for modelling. Data are usually the result of sensor measurements and, as such, they might be subjected to noise and outliers. The presence of outliers…

Robotics · Computer Science 2019-08-26 Francesco Cursi , Guang-Zhong Yang

Bayesian inference in hierarchical models by combining independent posteriors

Hierarchical models are versatile tools for joint modeling of data sets arising from different, but related, sources. Fully Bayesian inference may, however, become computationally prohibitive if the source-specific data models are complex,…

Computation · Statistics 2016-05-06 Ritabrata Dutta , Paul Blomstedt , Samuel Kaski

Robust and Fully-Dynamic Coreset for Continuous-and-Bounded Learning (With Outliers) Problems

In many machine learning tasks, a common approach for dealing with large-scale data is to build a small summary, {\em e.g.,} coreset, that can efficiently represent the original input. However, real-world datasets usually contain outliers…

Machine Learning · Computer Science 2022-01-24 Zixiu Wang , Yiwen Guo , Hu Ding

Divide-and-conquer methods for big data analysis

In the context of big data analysis, the divide-and-conquer methodology refers to a multiple-step process: first splitting a data set into several smaller ones; then analyzing each set separately; finally combining results from each…

Machine Learning · Statistics 2021-02-23 Xueying Chen , Jerry Q. Cheng , Min-ge Xie

Optimal Bayesian design for model discrimination via classification

Performing optimal Bayesian design for discriminating between competing models is computationally intensive as it involves estimating posterior model probabilities for thousands of simulated datasets. This issue is compounded further when…

Methodology · Statistics 2022-04-07 Markus Hainy , David J. Price , Olivier Restif , Christopher Drovandi

$\beta$-Cores: Robust Large-Scale Bayesian Data Summarization in the Presence of Outliers

Modern machine learning applications should be able to address the intrinsic challenges arising over inference on massive real-world datasets, including scalability and robustness to outliers. Despite the multiple benefits of Bayesian…

Machine Learning · Computer Science 2020-11-10 Dionysis Manousakas , Cecilia Mascolo