Related papers: BClean: A Bayesian Data Cleaning System

PClean: Bayesian Data Cleaning at Scale with Domain-Specific Probabilistic Programming

Data cleaning is naturally framed as probabilistic inference in a generative model of ground-truth data and likely errors, but the diversity of real-world error patterns and the hardness of inference make Bayesian approaches difficult to…

Machine Learning · Computer Science 2022-11-22 Alexander K. Lew , Monica Agrawal , David Sontag , Vikash K. Mansinghka

Finding All Bayesian Network Structures within a Factor of Optimal

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2018-11-14 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek

Bayesian Data Cleaning for Web Data

Data Cleaning is a long standing problem, which is growing in importance with the mass of uncurated web data. State of the art approaches for handling inconsistent data are systems that learn and use conditional functional dependencies…

Databases · Computer Science 2012-04-18 Yuheng Hu , Sushovan De , Yi Chen , Subbarao Kambhampati

Learning All Credible Bayesian Network Structures for Model Averaging

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2020-09-01 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek

A Tutorial on Learning With Bayesian Networks

A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because…

Machine Learning · Computer Science 2022-01-11 David Heckerman

Generation and analysis of synthetic data via Bayesian networks: a robust approach for uncertainty quantification via Bayesian paradigm

Safe and reliable disclosure of information from confidential data is a challenging statistical problem. A common approach considers the generation of synthetic data, to be disclosed instead of the original data. Efficient approaches ought…

Methodology · Statistics 2024-03-04 Larissa N. A. Martins , Flávio B. Gonçalves , Thais P. Galletti

Bayesian Workflow

The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit…

Methodology · Statistics 2020-11-04 Andrew Gelman , Aki Vehtari , Daniel Simpson , Charles C. Margossian , Bob Carpenter , Yuling Yao , Lauren Kennedy , Jonah Gabry , Paul-Christian Bürkner , Martin Modrák

Fair Bayesian Data Selection via Generalized Discrepancy Measures

Fairness concerns are increasingly critical as machine learning models are deployed in high-stakes applications. While existing fairness-aware methods typically intervene at the model level, they often suffer from high computational costs,…

Machine Learning · Computer Science 2025-11-11 Yixuan Zhang , Jiabin Luo , Zhenggang Wang , Feng Zhou , Quyu Kong

Blang: Bayesian declarative modelling of general data structures and inference via algorithms based on distribution continua

Consider a Bayesian inference problem where a variable of interest does not take values in a Euclidean space. These "non-standard" data structures are in reality fairly common. They are frequently used in problems involving latent discrete…

Computation · Statistics 2021-06-25 Alexandre Bouchard-Côté , Kevin Chern , Davor Cubranic , Sahand Hosseini , Justin Hume , Matteo Lepur , Zihui Ouyang , Giorgio Sgarbi

An effective likelihood-free approximate computing method with statistical inferential guarantees

Approximate Bayesian computing is a powerful likelihood-free method that has grown increasingly popular since early applications in population genetics. However, complications arise in the theoretical justification for Bayesian inference…

Computation · Statistics 2018-12-03 Suzanne Thornton , Wentao Li , Min-ge Xie

Data Cleaning for Accurate, Fair, and Robust Models: A Big Data - AI Integration Approach

The wide use of machine learning is fundamentally changing the software development paradigm (a.k.a. Software 2.0) where data becomes a first-class citizen, on par with code. As machine learning is used in sensitive applications, it becomes…

Databases · Computer Science 2019-04-25 Ki Hyun Tae , Yuji Roh , Young Hun Oh , Hyunsu Kim , Steven Euijong Whang

Bayesian Batch Active Learning as Sparse Subset Approximation

Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the…

Machine Learning · Statistics 2021-02-09 Robert Pinsler , Jonathan Gordon , Eric Nalisnick , José Miguel Hernández-Lobato

Deeper Connections between Neural Networks and Gaussian Processes Speed-up Active Learning

Active learning methods for neural networks are usually based on greedy criteria which ultimately give a single new design point for the evaluation. Such an approach requires either some heuristics to sample a batch of design points at one…

Machine Learning · Computer Science 2020-01-28 Evgenii Tsymbalov , Sergei Makarychev , Alexander Shapeev , Maxim Panov

Bayesian Flow Networks

This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input…

Machine Learning · Computer Science 2025-03-12 Alex Graves , Rupesh Kumar Srivastava , Timothy Atkinson , Faustino Gomez

Nuclear data evaluation with Bayesian networks

Bayesian networks are graphical models to represent the probabilistic relationships between variables in the Bayesian framework. The knowledge of all variables can be updated using new information about some of the variables. We show that…

Data Analysis, Statistics and Probability · Physics 2021-10-22 Georg Schnabel , Roberto Capote , Arjan Koning , David Brown

Bayesian Level Set Clustering

Classically, Bayesian clustering interprets each component of a mixture model as a cluster. The inferred clustering posterior is highly sensitive to any inaccuracies in the kernel within each component. As this kernel is made more flexible,…

Methodology · Statistics 2025-12-12 David Buch , Miheer Dewaskar , David B. Dunson

Estimating Continuous Distributions in Bayesian Classifiers

When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated…

Machine Learning · Computer Science 2013-02-21 George H. John , Pat Langley

A Bayesian Method for Causal Modeling and Discovery Under Selection

This paper describes a Bayesian method for learning causal networks using samples that were selected in a non-random manner from a population of interest. Examples of data obtained by non-random sampling include convenience samples and…

Artificial Intelligence · Computer Science 2013-01-18 Gregory F. Cooper

Learning Bayesian Nets that Perform Well

A Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for…

Artificial Intelligence · Computer Science 2013-02-08 Russell Greiner , Adam J. Grove , Dale Schuurmans

HoloClean: Holistic Data Repairs with Probabilistic Inference

We introduce HoloClean, a framework for holistic data repairing driven by probabilistic inference. HoloClean unifies existing qualitative data repairing approaches, which rely on integrity constraints or external data sources, with…

Databases · Computer Science 2017-02-06 Theodoros Rekatsinas , Xu Chu , Ihab F. Ilyas , Christopher Ré