English

Extrapolative Controlled Sequence Generation via Iterative Refinement

Machine Learning 2023-06-08 v3 Computation and Language Quantitative Methods

Abstract

We study the problem of extrapolative controlled generation, i.e., generating sequences with attribute values beyond the range seen in training. This task is of significant importance in automated design, especially drug discovery, where the goal is to design novel proteins that are \textit{better} (e.g., more stable) than existing sequences. Thus, by definition, the target sequences and their attribute values are out of the training distribution, posing challenges to existing methods that aim to directly generate the target sequence. Instead, in this work, we propose Iterative Controlled Extrapolation (ICE) which iteratively makes local edits to a sequence to enable extrapolation. We train the model on synthetically generated sequence pairs that demonstrate small improvement in the attribute value. Results on one natural language task (sentiment analysis) and two protein engineering tasks (ACE2 stability and AAV fitness) show that ICE considerably outperforms state-of-the-art approaches despite its simplicity. Our code and models are available at: https://github.com/vishakhpk/iter-extrapolation.

Keywords

Cite

@article{arxiv.2303.04562,
  title  = {Extrapolative Controlled Sequence Generation via Iterative Refinement},
  author = {Vishakh Padmakumar and Richard Yuanzhe Pang and He He and Ankur P. Parikh},
  journal= {arXiv preprint arXiv:2303.04562},
  year   = {2023}
}

Comments

ICML 2023 - Camera Ready Version

R2 v1 2026-06-28T09:07:22.423Z