English

Improving Small Molecule Generation using Mutual Information Machine

Machine Learning 2023-03-31 v2 Artificial Intelligence Biomolecules Quantitative Methods

Abstract

We address the task of controlled generation of small molecules, which entails finding novel molecules with desired properties under certain constraints (e.g., similarity to a reference molecule). Here we introduce MolMIM, a probabilistic auto-encoder for small molecule drug discovery that learns an informative and clustered latent space. MolMIM is trained with Mutual Information Machine (MIM) learning, and provides a fixed length representation of variable length SMILES strings. Since encoder-decoder models can learn representations with ``holes'' of invalid samples, here we propose a novel extension to the training procedure which promotes a dense latent space, and allows the model to sample valid molecules from random perturbations of latent codes. We provide a thorough comparison of MolMIM to several variable-size and fixed-size encoder-decoder models, demonstrating MolMIM's superior generation as measured in terms of validity, uniqueness, and novelty. We then utilize CMA-ES, a naive black-box and gradient free search algorithm, over MolMIM's latent space for the task of property guided molecule optimization. We achieve state-of-the-art results in several constrained single property optimization tasks as well as in the challenging task of multi-objective optimization, improving over previous success rate SOTA by more than 5\% . We attribute the strong results to MolMIM's latent representation which clusters similar molecules in the latent space, whereas CMA-ES is often used as a baseline optimization method. We also demonstrate MolMIM to be favourable in a compute limited regime, making it an attractive model for such cases.

Keywords

Cite

@article{arxiv.2208.09016,
  title  = {Improving Small Molecule Generation using Mutual Information Machine},
  author = {Danny Reidenbach and Micha Livne and Rajesh K. Ilango and Michelle Gill and Johnny Israeli},
  journal= {arXiv preprint arXiv:2208.09016},
  year   = {2023}
}

Comments

Published at the MLDD workshop, ICLR 2023. version 2. 8 pages, 4 figures, 4 tables