English
Related papers

Related papers: BClean: A Bayesian Data Cleaning System

200 papers

Data cleaning is naturally framed as probabilistic inference in a generative model of ground-truth data and likely errors, but the diversity of real-world error patterns and the hardness of inference make Bayesian approaches difficult to…

Machine Learning · Computer Science 2022-11-22 Alexander K. Lew , Monica Agrawal , David Sontag , Vikash K. Mansinghka

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2018-11-14 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek

Data Cleaning is a long standing problem, which is growing in importance with the mass of uncurated web data. State of the art approaches for handling inconsistent data are systems that learn and use conditional functional dependencies…

Databases · Computer Science 2012-04-18 Yuheng Hu , Sushovan De , Yi Chen , Subbarao Kambhampati

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2020-09-01 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek

A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because…

Machine Learning · Computer Science 2022-01-11 David Heckerman

Safe and reliable disclosure of information from confidential data is a challenging statistical problem. A common approach considers the generation of synthetic data, to be disclosed instead of the original data. Efficient approaches ought…

Methodology · Statistics 2024-03-04 Larissa N. A. Martins , Flávio B. Gonçalves , Thais P. Galletti

The Bayesian approach to data analysis provides a powerful way to handle uncertainty in all observations, model parameters, and model structure using probability theory. Probabilistic programming languages make it easier to specify and fit…

Fairness concerns are increasingly critical as machine learning models are deployed in high-stakes applications. While existing fairness-aware methods typically intervene at the model level, they often suffer from high computational costs,…

Machine Learning · Computer Science 2025-11-11 Yixuan Zhang , Jiabin Luo , Zhenggang Wang , Feng Zhou , Quyu Kong

Consider a Bayesian inference problem where a variable of interest does not take values in a Euclidean space. These "non-standard" data structures are in reality fairly common. They are frequently used in problems involving latent discrete…

Approximate Bayesian computing is a powerful likelihood-free method that has grown increasingly popular since early applications in population genetics. However, complications arise in the theoretical justification for Bayesian inference…

Computation · Statistics 2018-12-03 Suzanne Thornton , Wentao Li , Min-ge Xie

The wide use of machine learning is fundamentally changing the software development paradigm (a.k.a. Software 2.0) where data becomes a first-class citizen, on par with code. As machine learning is used in sensitive applications, it becomes…

Databases · Computer Science 2019-04-25 Ki Hyun Tae , Yuji Roh , Young Hun Oh , Hyunsu Kim , Steven Euijong Whang

Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the…

Machine Learning · Statistics 2021-02-09 Robert Pinsler , Jonathan Gordon , Eric Nalisnick , José Miguel Hernández-Lobato

Active learning methods for neural networks are usually based on greedy criteria which ultimately give a single new design point for the evaluation. Such an approach requires either some heuristics to sample a batch of design points at one…

Machine Learning · Computer Science 2020-01-28 Evgenii Tsymbalov , Sergei Makarychev , Alexander Shapeev , Maxim Panov

This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input…

Machine Learning · Computer Science 2025-03-12 Alex Graves , Rupesh Kumar Srivastava , Timothy Atkinson , Faustino Gomez

Bayesian networks are graphical models to represent the probabilistic relationships between variables in the Bayesian framework. The knowledge of all variables can be updated using new information about some of the variables. We show that…

Data Analysis, Statistics and Probability · Physics 2021-10-22 Georg Schnabel , Roberto Capote , Arjan Koning , David Brown

Classically, Bayesian clustering interprets each component of a mixture model as a cluster. The inferred clustering posterior is highly sensitive to any inaccuracies in the kernel within each component. As this kernel is made more flexible,…

Methodology · Statistics 2025-12-12 David Buch , Miheer Dewaskar , David B. Dunson

When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated…

Machine Learning · Computer Science 2013-02-21 George H. John , Pat Langley

This paper describes a Bayesian method for learning causal networks using samples that were selected in a non-random manner from a population of interest. Examples of data obtained by non-random sampling include convenience samples and…

Artificial Intelligence · Computer Science 2013-01-18 Gregory F. Cooper

A Bayesian net (BN) is more than a succinct way to encode a probabilistic distribution; it also corresponds to a function used to answer queries. A BN can therefore be evaluated by the accuracy of the answers it returns. Many algorithms for…

Artificial Intelligence · Computer Science 2013-02-08 Russell Greiner , Adam J. Grove , Dale Schuurmans

We introduce HoloClean, a framework for holistic data repairing driven by probabilistic inference. HoloClean unifies existing qualitative data repairing approaches, which rely on integrity constraints or external data sources, with…

Databases · Computer Science 2017-02-06 Theodoros Rekatsinas , Xu Chu , Ihab F. Ilyas , Christopher Ré
‹ Prev 1 2 3 10 Next ›