Related papers: Towards Accelerated Model Training via Bayesian Da…

Towards Bayesian Data Selection

A wide range of machine learning algorithms iteratively add data to the training sample. Examples include semi-supervised learning, active learning, multi-armed bandits, and Bayesian optimization. We embed this kind of data addition into…

Machine Learning · Statistics 2024-06-25 Julian Rodemann

Bayesian analysis of the prevalence bias: learning and predicting from imbalanced data

Datasets are rarely a realistic approximation of the target population. Say, prevalence is misrepresented, image quality is above clinical standards, etc. This mismatch is known as sampling bias. Sampling biases are a major hindrance for…

Machine Learning · Computer Science 2021-08-03 Loic Le Folgoc , Vasileios Baltatzis , Amir Alansary , Sujal Desai , Anand Devaraj , Sam Ellis , Octavio E. Martinez Manzanera , Fahdi Kanavati , Arjun Nair , Julia Schnabel , Ben Glocker

Bayesian Optimization for Selecting Efficient Machine Learning Models

The performance of many machine learning models depends on their hyper-parameter settings. Bayesian Optimization has become a successful tool for hyper-parameter optimization of machine learning algorithms, which aims to identify optimal…

Machine Learning · Computer Science 2020-08-04 Lidan Wang , Franck Dernoncourt , Trung Bui

Progressive Sampling-Based Bayesian Optimization for Efficient and Automatic Machine Learning Model Selection

Purpose: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting…

Machine Learning · Computer Science 2018-12-10 Xueqiang Zeng , Gang Luo

Sampling Bias Correction for Supervised Machine Learning: A Bayesian Inference Approach with Practical Applications

Given a supervised machine learning problem where the training set has been subject to a known sampling bias, how can a model be trained to fit the original dataset? We achieve this through the Bayesian inference framework by altering the…

Machine Learning · Statistics 2022-03-16 Max Sklar

Model Debiasing by Learnable Data Augmentation

Deep Neural Networks are well known for efficiently fitting training data, yet experiencing poor generalization capabilities whenever some kind of bias dominates over the actual task labels, resulting in models learning "shortcuts". In…

Machine Learning · Computer Science 2024-08-12 Pietro Morerio , Ruggero Ragonesi , Vittorio Murino

Navigating Towards Fairness with Data Selection

Machine learning algorithms often struggle to eliminate inherent data biases, particularly those arising from unreliable labels, which poses a significant challenge in ensuring fairness. Existing fairness techniques that address label bias…

Machine Learning · Computer Science 2024-12-17 Yixuan Zhang , Zhidong Li , Yang Wang , Fang Chen , Xuhui Fan , Feng Zhou

Bayesian Model Selection on Random Networks

A general Bayesian framework for model selection on random network models regarding their features is considered. The goal is to develop a principle Bayesian model selection approach to compare different fittable, not necessarily nested,…

Methodology · Statistics 2020-04-30 Papamichalis Marios

Toward Optimal Probabilistic Active Learning Using a Bayesian Approach

Gathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling…

Machine Learning · Computer Science 2020-06-03 Daniel Kottke , Marek Herde , Christoph Sandrock , Denis Huseljic , Georg Krempl , Bernhard Sick

Predictive Data Selection: The Data That Predicts Is the Data That Teaches

Language model pretraining involves training on extensive corpora, where data quality plays a pivotal role. In this work, we aim to directly estimate the contribution of data during pretraining and select pretraining data in an efficient…

Computation and Language · Computer Science 2025-08-05 Kashun Shum , Yuzhen Huang , Hongjian Zou , Qi Ding , Yixuan Liao , Xiaoxin Chen , Qian Liu , Junxian He

Making Better Use of Unlabelled Data in Bayesian Active Learning

Fully supervised models are predominant in Bayesian active learning. We argue that their neglect of the information present in unlabelled data harms not just predictive performance but also decisions about what data to acquire. Our proposed…

Machine Learning · Computer Science 2024-04-29 Freddie Bickford Smith , Adam Foster , Tom Rainforth

Target-Focused Feature Selection Using a Bayesian Approach

In many real-world scenarios where data is high dimensional, test time acquisition of features is a non-trivial task due to costs associated with feature acquisition and evaluating feature value. The need for highly confident models with an…

Machine Learning · Computer Science 2019-09-17 Orpaz Goldstein , Mohammad Kachuee , Kimmo Karkkainen , Majid Sarrafzadeh

Bayesian Batch Active Learning as Sparse Subset Approximation

Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the…

Machine Learning · Statistics 2021-02-09 Robert Pinsler , Jonathan Gordon , Eric Nalisnick , José Miguel Hernández-Lobato

A Bayesian Perspective on Training Speed and Model Selection

We take a Bayesian perspective to illustrate a connection between training speed and the marginal likelihood in linear models. This provides two major insights: first, that a measure of a model's training speed can be used to estimate its…

Machine Learning · Computer Science 2020-10-28 Clare Lyle , Lisa Schut , Binxin Ru , Yarin Gal , Mark van der Wilk

Comparison of Bayesian predictive methods for model selection

The goal of this paper is to compare several widely used Bayesian model selection methods in practical model selection problems, highlight their differences and give recommendations about the preferred approaches. We focus on the variable…

Methodology · Statistics 2017-12-18 Juho Piironen , Aki Vehtari

Learning All Credible Bayesian Network Structures for Model Averaging

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known…

Artificial Intelligence · Computer Science 2020-09-01 Zhenyu A. Liao , Charupriya Sharma , James Cussens , Peter van Beek

Provably Improving Generalization of Few-Shot Models with Synthetic Data

Few-shot image classification remains challenging due to the scarcity of labeled training examples. Augmenting them with synthetic data has emerged as a promising way to alleviate this issue, but models trained on synthetic samples often…

Machine Learning · Computer Science 2025-06-26 Lan-Cuong Nguyen , Quan Nguyen-Tri , Bang Tran Khanh , Dung D. Le , Long Tran-Thanh , Khoat Than

Behavior Retrieval: Few-Shot Imitation Learning by Querying Unlabeled Datasets

Enabling robots to learn novel visuomotor skills in a data-efficient manner remains an unsolved problem with myriad challenges. A popular paradigm for tackling this problem is through leveraging large unlabeled datasets that have many…

Robotics · Computer Science 2023-05-16 Maximilian Du , Suraj Nair , Dorsa Sadigh , Chelsea Finn

Zero-shot meta-learning for small-scale data from human subjects

While developments in machine learning led to impressive performance gains on big data, many human subjects data are, in actuality, small and sparsely labeled. Existing methods applied to such data often do not easily generalize to…

Machine Learning · Computer Science 2023-04-04 Julie Jiang , Kristina Lerman , Emilio Ferrara

Coupled Training with Privileged Information and Unlabeled Data

In many prediction problems, we have extra information during training (for example, measurements that are expensive or slow to collect) that will not be available when the model is deployed. A common strategy is to first train a model that…

Machine Learning · Statistics 2026-05-25 Jiahao Shi , Omar Hagrass , Jason M. Klusowski