Split Optimization for Protein/Ligand Binding Models

Brian Davis; Kevin Mcloughlin; Jonathan Allen; Sally Ellingson

Split Optimization for Protein/Ligand Binding Models

Biomolecules 2020-01-13 v1

Authors: Brian Davis , Kevin Mcloughlin , Jonathan Allen , Sally Ellingson

Abstract

In this paper, we investigate potential biases in datasets used to make drug binding predictions using machine learning. We investigate a recently published metric called the Asymmetric Validation Embedding (AVE) bias which is used to quantify this bias and detect overfitting. We compare it to a slightly revised version and introduce a new weighted metric. We find that the new metrics allow to quantify overfitting while not overly limiting training data and produce models with greater predictive value.

Split Optimization for Protein/Ligand Binding Models

Abstract

Keywords

Cite

Related papers