English

Learning the hypotheses space from data through a U-curve algorithm

Machine Learning 2021-10-12 v2 Machine Learning

Abstract

This paper proposes a data-driven systematic, consistent and non-exhaustive approach to Model Selection, that is an extension of the classical agnostic PAC learning model. In this approach, learning problems are modeled not only by a hypothesis space H\mathcal{H}, but also by a Learning Space L(H)\mathbb{L}(\mathcal{H}), a poset of subspaces of H\mathcal{H}, which covers H\mathcal{H} and satisfies a property regarding the VC dimension of related subspaces, that is a suitable algebraic search space for Model Selection algorithms. Our main contributions are a data-driven general learning algorithm to perform implicitly regularized Model Selection on L(H)\mathbb{L}(\mathcal{H}) and a framework under which one can, theoretically, better estimate a target hypothesis with a given sample size by properly modeling L(H)\mathbb{L}(\mathcal{H}) and employing high computational power. A remarkable consequence of this approach are conditions under which a non-exhaustive search of L(H)\mathbb{L}(\mathcal{H}) can return an optimal solution. The results of this paper lead to a practical property of Machine Learning, that the lack of experimental data may be mitigated by a high computational capacity. In a context of continuous popularization of computational power, this property may help understand why Machine Learning has become so important, even where data is expensive and hard to get.

Keywords

Cite

@article{arxiv.2109.03866,
  title  = {Learning the hypotheses space from data through a U-curve algorithm},
  author = {Diego Marcondes and Adilson Simonis and Junior Barrera},
  journal= {arXiv preprint arXiv:2109.03866},
  year   = {2021}
}

Comments

This is work is a merger of arXiv:2001.09532 and arXiv:2001.11578