Related papers: Batch Bayesian Active Learning with Partial Batch …
Information-theoretic approaches to active learning have traditionally focused on maximising the information gathered about the model parameters, most commonly by optimising the BALD score. We highlight that this can be suboptimal from the…
Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the…
Expanding on MacKay (1992), we argue that conventional model-based methods for active learning - like BALD - have a fundamental shortfall: they fail to directly account for the test-time distribution of the input variables. This can lead to…
Active learning is able to reduce the amount of labelling effort by using a machine learning model to query the user for specific inputs. While there are many papers on new active learning techniques, these techniques rarely satisfy the…
We observe that BatchBALD, a popular acquisition function for batch Bayesian active learning for classification, can conflate epistemic and aleatoric uncertainty, leading to suboptimal performance. Motivated by this observation, we propose…
We develop BatchBALD, a tractable approximation to the mutual information between a batch of points and model parameters, which we use as an acquisition function to select multiple informative points jointly for the task of deep Bayesian…
In deep active learning, it is especially important to choose multiple examples to markup at each step to work efficiently, especially on large datasets. At the same time, existing solutions to this problem in the Bayesian setup, such as…
Active learning has been studied extensively as a method for efficient data collection. Among the many approaches in literature, Expected Error Reduction (EER) (Roy and McCallum) has been shown to be an effective method for active learning:…
We develop BatchEvaluationBALD, a new acquisition function for deep Bayesian active learning, as an expansion of BatchBALD that takes into account an evaluation set of unlabeled data, for example, the pool set. We also develop a variant for…
Labeled data can be expensive to acquire in several application domains, including medical imaging, robotics, and computer vision. To efficiently train machine learning models under such high labeling costs, active learning (AL) judiciously…
Bayesian Active Learning (BAL) is an efficient framework for learning the parameters of a model, in which input stimuli are selected to maximize the mutual information between the observations and the unknown parameters. However, the…
Bayesian active learning is based on information theoretical approaches that focus on maximising the information that new observations provide to the model parameters. This is commonly done by maximising the Bayesian Active Learning by…
Active learning is usually applied to acquire labels of informative data points in supervised learning, to maximize accuracy in a sample-efficient way. However, maximizing the accuracy is not the end goal when the results are used for…
The effectiveness of active learning largely depends on the sampling efficiency of the acquisition function. Expected Loss Reduction (ELR) focuses on a Bayesian estimate of the reduction in classification error, and more general costs fit…
Gathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling…
The ability to train complex and highly effective models often requires an abundance of training data, which can easily become a bottleneck in cost, time, and computational resources. Batch active learning, which adaptively issues batched…
Obtaining labeled data for machine learning tasks can be prohibitively expensive. Active learning mitigates this issue by exploring the unlabeled data space and prioritizing the selection of data that can best improve the model performance.…
Leveraging human preferences for steering the behavior of Large Language Models (LLMs) has demonstrated notable success in recent years. Nonetheless, data selection and labeling are still a bottleneck for these systems, particularly at…
The central goal of active learning is to gather data that maximises downstream predictive performance, but popular approaches have limited flexibility in customising this data acquisition to different downstream problems and losses. We…
Active learning is a powerful method for training machine learning models with limited labeled data. One commonly used technique for active learning is BatchBALD, which uses Bayesian neural networks to find the most informative points to…