Related papers: Improving Small Molecule Generation using Mutual I…

MIM: Mutual Information Machine

We introduce the Mutual Information Machine (MIM), a probabilistic auto-encoder for learning joint distributions over observations and latent variables. MIM reflects three design principles: 1) low divergence, to encourage the encoder and…

Machine Learning · Computer Science 2020-02-24 Micha Livne , Kevin Swersky , David J. Fleet

MolMem: Memory-Augmented Agentic Reinforcement Learning for Sample-Efficient Molecular Optimization

In drug discovery, molecular optimization aims to iteratively refine a lead compound to improve molecular properties while preserving structural similarity to the original molecule. However, each oracle evaluation is expensive, making…

Machine Learning · Computer Science 2026-04-15 Ziqing Wang , Yibo Wen , Abhishek Pandy , Han Liu , Kaize Ding

Instruction Multi-Constraint Molecular Generation Using a Teacher-Student Large Language Model

While various models and computational tools have been proposed for structure and property analysis of molecules, generating molecules that conform to all desired structures and properties remains a challenge. Here, we introduce a…

Computation and Language · Computer Science 2024-10-11 Peng Zhou , Jianmin Wang , Chunyan Li , Zixu Wang , Yiping Liu , Siqi Sun , Jianxin Lin , Leyi Wei , Xibao Cai , Houtim Lai , Wei Liu , Longyue Wang , Yuansheng Liu , Xiangxiang Zeng

Contrastive MIM: A Contrastive Mutual Information Framework for Unified Generative and Discriminative Representation Learning

Learning representations that generalize well to unknown downstream tasks is a central challenge in representation learning. Existing approaches such as contrastive learning, self-supervised masking, and denoising auto-encoders address this…

Machine Learning · Computer Science 2025-09-10 Micha Livne

GP-MoLFormer-Sim: Test Time Molecular Optimization through Contextual Similarity Guidance

The ability to design molecules while preserving similarity to a target molecule and/or property is crucial for various applications in drug discovery, chemical design, and biology. We introduce in this paper an efficient training-free…

Machine Learning · Computer Science 2025-11-18 Jiri Navratil , Jarret Ross , Payel Das , Youssef Mroueh , Samuel C Hoffman , Vijil Chenthamarakshan , Brian Belgodere

High Mutual Information in Representation Learning with Symmetric Variational Inference

We introduce the Mutual Information Machine (MIM), a novel formulation of representation learning, using a joint distribution over the observations and latent state in an encoder/decoder framework. Our key principles are symmetry and mutual…

Machine Learning · Statistics 2019-10-10 Micha Livne , Kevin Swersky , David J. Fleet

Contrastive Mutual Information Learning: Toward Robust Representations without Positive-Pair Augmentations

Learning representations that transfer well to diverse downstream tasks remains a central challenge in representation learning. Existing paradigms -- contrastive learning, self-supervised masking, and denoising auto-encoders -- balance this…

Machine Learning · Computer Science 2025-09-29 Micha Livne

MolFusion: Multimodal Fusion Learning for Molecular Representations via Multi-granularity Views

Artificial Intelligence predicts drug properties by encoding drug molecules, aiding in the rapid screening of candidates. Different molecular representations, such as SMILES and molecule graphs, contain complementary information for…

Machine Learning · Computer Science 2024-06-27 Muzhen Cai , Sendong Zhao , Haochun Wang , Yanrui Du , Zewen Qiang , Bing Qin , Ting Liu

LIMO: Latent Inceptionism for Targeted Molecule Generation

Generation of drug-like molecules with high binding affinity to target proteins remains a difficult and resource-intensive task in drug discovery. Existing approaches primarily employ reinforcement learning, Markov sampling, or deep…

Machine Learning · Computer Science 2022-06-22 Peter Eckmann , Kunyang Sun , Bo Zhao , Mudong Feng , Michael K. Gilson , Rose Yu

MetaMolGen: A Neural Graph Motif Generation Model for De Novo Molecular Design

Molecular generation plays an important role in drug discovery and materials science, especially in data-scarce scenarios where traditional generative models often struggle to achieve satisfactory conditional generalization. To address this…

Machine Learning · Computer Science 2025-05-13 Zimo Yan , Jie Zhang , Zheng Xie , Chang Liu , Yizhen Liu , Yiping Song

Large-Scale Chemical Language Representations Capture Molecular Structure and Properties

Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance,…

Machine Learning · Computer Science 2022-12-15 Jerret Ross , Brian Belgodere , Vijil Chenthamarakshan , Inkit Padhi , Youssef Mroueh , Payel Das

InstructMol: Multi-Modal Integration for Building a Versatile and Reliable Molecular Assistant in Drug Discovery

The rapid evolution of artificial intelligence in drug discovery encounters challenges with generalization and extensive training, yet Large Language Models (LLMs) offer promise in reshaping interactions with complex molecular data. Our…

Biomolecules · Quantitative Biology 2024-12-20 He Cao , Zijing Liu , Xingyu Lu , Yuan Yao , Yu Li

MIMOSA: Multi-constraint Molecule Sampling for Molecule Optimization

Molecule optimization is a fundamental task for accelerating drug discovery, with the goal of generating new valid molecules that maximize multiple drug properties while maintaining similarity to the input molecule. Existing generative…

Machine Learning · Computer Science 2024-07-02 Tianfan Fu , Cao Xiao , Xinhao Li , Lucas M. Glass , Jimeng Sun

Optimizing Molecules using Efficient Queries from Property Evaluations

Machine learning based methods have shown potential for optimizing existing molecules with more desirable properties, a critical step towards accelerating new chemical discovery. Here we propose QMO, a generic query-based molecule…

Machine Learning · Computer Science 2022-04-21 Samuel Hoffman , Vijil Chenthamarakshan , Kahini Wadhawan , Pin-Yu Chen , Payel Das

Machine learning-assisted search for novel coagulants: when machine learning can be efficient even if data availability is low

Design of new drugs is a challenging process: a candidate molecule should satisfy multiple conditions to act properly and make the least side-effect -- perfect candidates selectively attach to and influence only targets, leaving off-targets…

Biomolecules · Quantitative Biology 2024-05-07 Andrij Rovenchak , Maksym Druchok

SMILES Transformer: Pre-trained Molecular Fingerprint for Low Data Drug Discovery

In drug-discovery-related tasks such as virtual screening, machine learning is emerging as a promising way to predict molecular properties. Conventionally, molecular fingerprints (numerical representations of molecules) are calculated…

Machine Learning · Computer Science 2019-11-13 Shion Honda , Shoi Shi , Hiroki R. Ueda

SMolLM: Small Language Models Learn Small Molecular Grammar

Language models for molecular design have scaled to hundreds of millions of parameters, yet how they learn chemical grammar is poorly understood. We train SMolLM, a 53K-parameter weight-shared transformer, to generate novel SMILES with 95%…

Machine Learning · Computer Science 2026-05-29 Akhil Jindal , Harang Ju

Cross-Modality Controlled Molecule Generation with Diffusion Language Model

Current SMILES-based diffusion models for molecule generation typically support only unimodal constraint. They inject conditioning signals at the start of the training process and require retraining a new model from scratch whenever the…

Machine Learning · Computer Science 2025-08-21 Yunzhe Zhang , Yifei Wang , Khanh Vinh Nguyen , Pengyu Hong

Molecule optimization via multi-objective evolutionary in implicit chemical space

Machine learning methods have been used to accelerate the molecule optimization process. However, efficient search for optimized molecules satisfying several properties with scarce labeled data remains a challenge for machine learning…

Biomolecules · Quantitative Biology 2022-12-20 Xin Xia , Yansen Su , Chunhou Zheng , Xiangxiang Zeng

ChatMol: A Versatile Molecule Designer Based on the Numerically Enhanced Large Language Model

Goal-oriented de novo molecule design, namely generating molecules with specific property or substructure constraints, is a crucial yet challenging task in drug discovery. Existing methods, such as Bayesian optimization and reinforcement…

Computational Engineering, Finance, and Science · Computer Science 2025-02-28 Chuanliu Fan , Ziqiang Cao , Zicheng Ma , Nan Yu , Yimin Peng , Jun Zhang , Yiqin Gao , Guohong Fu