English

Deep learning-guided evolutionary optimization for protein design

Machine Learning 2026-03-04 v1 Quantitative Methods Machine Learning

Abstract

Designing novel proteins with desired characteristics remains a significant challenge due to the large sequence space and the complexity of sequence-function relationships. Efficient exploration of this space to identify sequences that meet specific design criteria is crucial for advancing therapeutics and biotechnology. Here, we present BoGA (Bayesian Optimization Genetic Algorithm), a framework that combines evolutionary search with Bayesian optimization to efficiently navigate the sequence space. By integrating a genetic algorithm as a stochastic proposal generator within a surrogate modeling loop, BoGA prioritizes candidates based on prior evaluations and surrogate model predictions, enabling data-efficient optimization. We demonstrate the utility of BoGA through benchmarking on sequence and structure design tasks, followed by its application in designing peptide binders against pneumolysin, a key virulence factor of \textit{Streptococcus pneumoniae}. BoGA accelerates the discovery of high-confidence binders, demonstrating the potential for efficient protein design across diverse objectives. The algorithm is implemented within the BoPep suite and is available under an MIT license at \href{https://github.com/ErikHartman/bopep}{GitHub}.

Keywords

Cite

@article{arxiv.2603.02753,
  title  = {Deep learning-guided evolutionary optimization for protein design},
  author = {Erik Hartman and Di Tang and Johan Malmström},
  journal= {arXiv preprint arXiv:2603.02753},
  year   = {2026}
}

Comments

Code available at GitHub