Evolutionary Dataset Optimisation: learning algorithm quality through evolution
Abstract
In this paper we propose a novel method for learning how algorithms perform. Classically, algorithms are compared on a finite number of existing (or newly simulated) benchmark datasets based on some fixed metrics. The algorithm(s) with the smallest value of this metric are chosen to be the `best performing'. We offer a new approach to flip this paradigm. We instead aim to gain a richer picture of the performance of an algorithm by generating artificial data through genetic evolution, the purpose of which is to create populations of datasets for which a particular algorithm performs well on a given metric. These datasets can be studied so as to learn what attributes lead to a particular progression of a given algorithm. Following a detailed description of the algorithm as well as a brief description of an open source implementation, a case study in clustering is presented. This case study demonstrates the performance and nuances of the method which we call Evolutionary Dataset Optimisation. In this study, a number of known properties about preferable datasets for the clustering algorithms known as (k)-means and DBSCAN are realised in the generated datasets.
Cite
@article{arxiv.1907.13508,
title = {Evolutionary Dataset Optimisation: learning algorithm quality through evolution},
author = {Henry Wilde and Vincent Knight and Jonathan Gillard},
journal= {arXiv preprint arXiv:1907.13508},
year = {2019}
}
Comments
33 pages, 15 figures