Related papers: A Bayesian Approach for Accurate Classification-Ba…
Selection bias arises when the probability that an observation enters a dataset depends on variables related to the quantities of interest, leading to systematic distortions in estimation and uncertainty quantification. For example, in…
The vast majority of statistical theory on binary classification characterizes performance in terms of accuracy. However, accuracy is known in many cases to poorly reflect the practical consequences of classification error, most famously in…
Gathering labeled data to train well-performing machine learning models is one of the critical challenges in many applications. Active learning aims at reducing the labeling costs by an efficient and effective allocation of costly labeling…
A set of probabilistic predictions is well calibrated if the events that are predicted to occur with probability p do in fact occur about p fraction of the time. Well calibrated predictions are particularly important when machine learning…
Sparse functional data frequently arise in real-world applications, posing significant challenges for accurate classification. To address this, we propose a novel classification method that integrates functional principal component analysis…
Given a supervised machine learning problem where the training set has been subject to a known sampling bias, how can a model be trained to fit the original dataset? We achieve this through the Bayesian inference framework by altering the…
We develop a fully Bayesian, logistic tracking algorithm with the purpose of providing classification results that are unbiased when applied uniformly to individuals with differing sensitive variable values. Here, we consider bias in the…
A general challenge in statistics is prediction in the presence of multiple candidate models or learning algorithms. Model aggregation tries to combine all predictive distributions from individual models, which is more stable and flexible…
Model selection aims to identify a sufficiently well performing model that is possibly simpler than the most complex model among a pool of candidates. However, the decision-making process itself can inadvertently introduce non-negligible…
While the performance of machine learning systems has experienced significant improvement in recent years, relatively little attention has been paid to the fundamental question: to what extent can we improve our models? This paper provides…
The model averaging problem is to average multiple models to achieve a prediction accuracy not much worse than that of the best single model in terms of mean squared error. It is known that if the models are misspecified, model averaging is…
There is a fundamental limitation in the prediction performance that a machine learning model can achieve due to the inevitable uncertainty of the prediction target. In classification problems, this can be characterized by the Bayes error,…
Leveraging the wealth of unlabeled data produced in recent years provides great potential for improving supervised models. When the cost of acquiring labels is high, probabilistic active learning methods can be used to greedily select the…
Bayesian linear mixed-effects models and Bayesian ANOVA are increasingly being used in the cognitive sciences to perform null hypothesis tests, where a null hypothesis that an effect is zero is compared with an alternative hypothesis that…
We propose a novel approach to approximate Bayesian computation (ABC) that seeks to cater for possible misspecification of the assumed model. This new approach can be equally applied to rejection-based ABC and to popular regression…
To collect large scale annotated data, it is inevitable to introduce label noise, i.e., incorrect class labels. To be robust against label noise, many successful methods rely on the noisy classifiers (i.e., models trained on the noisy…
In computational inverse problems, it is common that a detailed and accurate forward model is approximated by a computationally less challenging substitute. The model reduction may be necessary to meet constraints in computing time when…
Data clustering, including problems such as finding network communities, can be put into a systematic framework by means of a Bayesian approach. The application of Bayesian approaches to real problems can be, however, quite challenging. In…
Purpose: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting…
In order to improve forecasts, a decisionmaker often combines probabilities given by various sources, such as human experts and machine learning classifiers. When few training data are available, aggregation can be improved by incorporating…