Related papers: A theoretical comparison of the data augmentation,…

The data augmentation algorithm

The data augmentation (DA) algorithms are popular Markov chain Monte Carlo (MCMC) algorithms often used for sampling from intractable probability distributions. This review article comprehensively surveys DA MCMC algorithms, highlighting…

Computation · Statistics 2024-06-18 Vivekananda Roy , Kshitij Khare , James P. Hobert

Convergence Analysis of the Data Augmentation Algorithm for Bayesian Linear Regression with Non-Gaussian Errors

Gaussian errors are sometimes inappropriate in a multivariate linear regression setting because, for example, the data contain outliers. In such situations, it is often assumed that the error density is a scale mixture of multivariate…

Statistics Theory · Mathematics 2016-01-28 James P. Hobert , Yeun Ji Jung , Kshitij Khare , Qian Qin

A spectral analytic comparison of trace-class data augmentation algorithms and their sandwich variants

The data augmentation (DA) algorithm is a widely used Markov chain Monte Carlo algorithm that is easy to implement but often suffers from slow convergence. The sandwich algorithm is an alternative that can converge much faster while…

Statistics Theory · Mathematics 2012-02-24 Kshitij Khare , James P. Hobert

Asynchronous and Distributed Data Augmentation for Massive Data Settings

Data augmentation (DA) algorithms are widely used for Bayesian inference due to their simplicity. In massive data settings, however, DA algorithms are prohibitively slow because they pass through the full data in any iteration, imposing…

Computation · Statistics 2021-09-21 Jiayuan Zhou , Kshitij Khare , Sanvesh Srivastava

Scaling up Data Augmentation MCMC via Calibration

There has been considerable interest in making Bayesian inference more scalable. In big data settings, most literature focuses on reducing the computing time per iteration, with less focused on reducing the number of iterations needed in…

Methodology · Statistics 2017-09-28 Leo L. Duan , James E. Johndrow , David B. Dunson

Trace-class Monte Carlo Markov Chains for Bayesian Multivariate Linear Regression with Non-Gaussian Errors

Let $\pi$ denote the intractable posterior density that results when the likelihood from a multivariate linear regression model with errors from a scale mixture of normals is combined with the standard non-informative prior. There is a…

Statistics Theory · Mathematics 2016-06-02 Qian Qin , James P. Hobert

A Survey of Mix-based Data Augmentation: Taxonomy, Methods, Applications, and Explainability

Data augmentation (DA) is indispensable in modern machine learning and deep neural networks. The basic idea of DA is to construct new training data to improve the model's generalization by adding slightly disturbed versions of existing data…

Machine Learning · Computer Science 2024-06-05 Chengtai Cao , Fan Zhou , Yurou Dai , Jianping Wang , Kunpeng Zhang

Improving the Convergence Properties of the Data Augmentation Algorithm with an Application to Bayesian Mixture Modeling

The reversible Markov chains that drive the data augmentation (DA) and sandwich algorithms define self-adjoint operators whose spectra encode the convergence properties of the algorithms. When the target distribution has uncountable…

Methodology · Statistics 2012-02-06 James P. Hobert , Vivekananda Roy , Christian P. Robert

Convergence analysis of the block Gibbs sampler for Bayesian probit linear mixed models with improper priors

In this article, we consider Markov chain Monte Carlo(MCMC) algorithms for exploring the intractable posterior density associated with Bayesian probit linear mixed models under improper priors on the regression coefficients and variance…

Statistics Theory · Mathematics 2018-11-26 Xin Wang , Vivekananda Roy

Monotone data augmentation algorithm for longitudinal continuous, binary and ordinal outcomes: a unifying approach

The monotone data augmentation (MDA) algorithm has been widely used to impute missing data for longitudinal continuous outcomes. Compared to a full data augmentation approach, the MDA scheme accelerates the mixing of the Markov chain,…

Methodology · Statistics 2025-12-23 Yongqiang Tang

On the Data Augmentation Algorithm for Bayesian Multivariate Linear Regression with Non-Gaussian Errors

Let $\pi$ denote the intractable posterior density that results when the likelihood from a multivariate linear regression model with errors from a scale mixture of normals is combined with the standard non-informative prior. There is a…

Statistics Theory · Mathematics 2015-12-08 Qian Qin , James P. Hobert

Optimizing Data Augmentation through Bayesian Model Selection

Data Augmentation (DA) has become an essential tool to improve robustness and generalization of modern machine learning. However, when deciding on DA strategies it is critical to choose parameters carefully, and this can be a daunting task…

Machine Learning · Computer Science 2026-03-04 Madi Matymov , Ba-Hien Tran , Michael Kampffmeyer , Markus Heinonen , Maurizio Filippone

A Bit of Information Theory, and the Data Augmentation Algorithm Converges

The data augmentation (DA) algorithm is a simple and powerful tool in statistical computing. In this note basic information theory is used to prove a nontrivial convergence theorem for the DA algorithm.

Information Theory · Computer Science 2009-09-12 Yaming Yu

Learning Better with Less: Effective Augmentation for Sample-Efficient Visual Reinforcement Learning

Data augmentation (DA) is a crucial technique for enhancing the sample efficiency of visual reinforcement learning (RL) algorithms. Notably, employing simple observation transformations alone can yield outstanding performance without extra…

Machine Learning · Computer Science 2023-10-30 Guozheng Ma , Linrui Zhang , Haoyu Wang , Lu Li , Zilin Wang , Zhen Wang , Li Shen , Xueqian Wang , Dacheng Tao

Does Data Augmentation Lead to Positive Margin?

Data augmentation (DA) is commonly used during model training, as it significantly improves test error and model robustness. DA artificially expands the training set by applying random noise, rotations, crops, or even adversarial…

Machine Learning · Computer Science 2019-05-09 Shashank Rajput , Zhili Feng , Zachary Charles , Po-Ling Loh , Dimitris Papailiopoulos

Data transforming augmentation for heteroscedastic models

Data augmentation (DA) turns seemingly intractable computational problems into simple ones by augmenting latent missing data. In addition to computational simplicity, it is now well-established that DA equipped with a deterministic…

Methodology · Statistics 2020-05-26 Hyungsuk Tak , Kisung You , Sujit K. Ghosh , Bingyue Su , Joseph Kelly

Sample Efficiency of Data Augmentation Consistency Regularization

Data augmentation is popular in the training of large neural networks; currently, however, there is no clear theoretical comparison between different algorithmic choices on how to use augmented data. In this paper, we take a step in this…

Machine Learning · Computer Science 2022-06-17 Shuo Yang , Yijun Dong , Rachel Ward , Inderjit S. Dhillon , Sujay Sanghavi , Qi Lei

Asymptotically exact data augmentation: models, properties and algorithms

Data augmentation, by the introduction of auxiliary variables, has become an ubiquitous technique to improve convergence properties, simplify the implementation or reduce the computational time of inference methods such as Markov chain…

Methodology · Statistics 2020-09-30 Maxime Vono , Nicolas Dobigeon , Pierre Chainais

Anchor Data Augmentation

We propose a novel algorithm for data augmentation in nonlinear over-parametrized regression. Our data augmentation algorithm borrows from the literature on causality and extends the recently proposed Anchor regression (AR) method for data…

Machine Learning · Computer Science 2023-11-29 Nora Schneider , Shirin Goshtasbpour , Fernando Perez-Cruz

A monotone data augmentation algorithm for multivariate nonnormal data: with applications to controlled imputations for longitudinal trials

An efficient monotone data augmentation (MDA) algorithm is proposed for missing data imputation for incomplete multivariate nonnormal data that may contain variables of different types, and are modeled by a sequence of regression models…

Methodology · Statistics 2018-11-21 Yongqiang Tang