Bayesian Optimization of Text Representations
Abstract
When applying machine learning to problems in NLP, there are many choices to make about how to represent input texts. These choices can have a big effect on performance, but they are often uninteresting to researchers or practitioners who simply need a module that performs well. We propose an approach to optimizing over this space of choices, formulating the problem as global optimization. We apply a sequential model-based optimization technique and show that our method makes standard linear models competitive with more sophisticated, expensive state-of-the-art methods based on latent variable models or neural networks on various topic classification and sentiment analysis problems. Our approach is a first step towards black-box NLP systems that work with raw text and do not require manual tuning.
Cite
@article{arxiv.1503.00693,
title = {Bayesian Optimization of Text Representations},
author = {Dani Yogatama and Noah A. Smith},
journal= {arXiv preprint arXiv:1503.00693},
year = {2015}
}