Related papers: Sampling Techniques in Bayesian Target Encoding
Applied Data Scientists throughout various industries are commonly faced with the challenging task of encoding high-cardinality categorical features into digestible inputs for machine learning algorithms. This paper describes a Bayesian…
Since most machine learning (ML) algorithms are designed for numerical inputs, efficiently encoding categorical variables is a crucial aspect in data analysis. A common problem are high cardinality features, i.e. unordered categorical…
Gaussian Process (GP) regression is a popular and sample-efficient approach for many engineering applications, where observations are expensive to acquire, and is also a central ingredient of Bayesian optimization (BO), a highly prevailing…
Target encoding is an effective technique to deliver better performance for conventional machine learning methods, and recently, for deep neural networks as well. However, the existing target encoding approaches require significant increase…
The success of neural summarization models stems from the meticulous encodings of source articles. To overcome the impediments of limited and sometimes noisy training data, one promising direction is to make better use of the available…
In real life, media information has time attributes either implicitly or explicitly known as temporal data. This paper investigates the usefulness of applying Bayesian classification to an interval encoded temporal database with prioritized…
Bayesian Optimization (BO) methods are useful for optimizing functions that are expen- sive to evaluate, lack an analytical expression and whose evaluations can be contaminated by noise. These methods rely on a probabilistic model of the…
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because…
Purpose: Machine learning is broadly used for clinical data analysis. Before training a model, a machine learning algorithm must be selected. Also, the values of one or more model parameters termed hyper-parameters must be set. Selecting…
This paper presents novel Bayesian optimisation algorithms for minimum error rate training of statistical machine translation systems. We explore two classes of algorithms for efficiently exploring the translation space, with the first…
Bayesian models are a powerful tool for studying complex data, allowing the analyst to encode rich hierarchical dependencies and leverage prior information. Most importantly, they facilitate a complete characterization of uncertainty…
This paper presents a method called sampling-computation-optimization (SCO) to design batch Bayesian optimization. SCO does not construct new high-dimensional acquisition functions but samples from the existing one-site acquisition function…
This paper explores Bayesian estimation for categorical data, focusing on simple yet effective models that provide a foundation for applying more advanced methods accurately and reliably in real-world applications. We begin by revisiting…
Bi-clustering is a useful approach in analyzing biological data when observations come from heterogeneous groups and have a large number of features. We outline a general Bayesian approach in tackling bi-clustering problems in moderate to…
The process of dynamic state estimation (filtering) based on point process observations is in general intractable. Numerical sampling techniques are often practically useful, but lead to limited conceptual insight about optimal…
Bayesian statistics has gained great momentum since the computational developments of the 1990s. Gradually, advances in Bayesian methodology and software have made Bayesian techniques much more accessible to applied statisticians and, in…
Sparse coding aims to model data vectors as sparse linear combinations of basis elements, but a majority of related studies are restricted to continuous data without spatial or temporal structure. A new model-based sparse coding (MSC)…
We develop a new computational approach for "focused" optimal Bayesian experimental design with nonlinear models, with the goal of maximizing expected information gain in targeted subsets of model parameters. Our approach considers…
Bayesian sampling is an important task in statistics and machine learning. Over the past decade, many ensemble-type sampling methods have been proposed. In contrast to the classical Markov chain Monte Carlo methods, these new methods deploy…
Bayesian Optimization is ubiquitous in experimental design and black-box optimization for improving search efficiency. However, most existing approaches rely on regression models which are limited to fixed search spaces and structured,…