English
Related papers

Related papers: Diffusion Language Models Are Versatile Protein Le…

200 papers

Proteins are fundamental to biology, executing diverse functions through complex physicochemical interactions, and they hold transformative potential across medicine, materials science, and environmental applications. Protein Language…

Biomolecules · Quantitative Biology 2025-06-11 Logan Hallee , Nikolaos Rafailidis , David B. Bichara , Jason P. Gleghorn

Proteins are essential macromolecules defined by their amino acid sequences, which determine their three-dimensional structures and, consequently, their functions in all living organisms. Therefore, generative protein modeling necessitates…

Machine Learning · Computer Science 2024-10-18 Xinyou Wang , Zaixiang Zheng , Fei Ye , Dongyu Xue , Shujian Huang , Quanquan Gu

Proteins adopt multiple structural conformations to perform their diverse biological functions, and understanding these conformations is crucial for advancing drug discovery. Traditional physics-based simulation methods often struggle with…

Biomolecules · Quantitative Biology 2025-03-14 Jiarui Lu , Xiaoyin Chen , Stephen Zhewen Lu , Chence Shi , Hongyu Guo , Yoshua Bengio , Jian Tang

Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as…

Machine Learning · Computer Science 2026-05-11 Justin Sanders , Luca Giancardo , Lan Guo , Yue Zhao , Kemal Sonmez , Nina Cheng , Melih Yilmaz

Protein language models have shown remarkable success in learning biological information from protein sequences. However, most existing models are limited by either autoencoding or autoregressive pre-training objectives, which makes them…

Quantitative Methods · Quantitative Biology 2024-12-10 Bo Chen , Xingyi Cheng , Pan Li , Yangli-ao Geng , Jing Gong , Shen Li , Zhilei Bei , Xu Tan , Boyan Wang , Xin Zeng , Chiming Liu , Aohan Zeng , Yuxiao Dong , Jie Tang , Le Song

We propose ProtLLM, a versatile cross-modal large language model (LLM) for both protein-centric and protein-language tasks. ProtLLM features a unique dynamic protein mounting mechanism, enabling it to handle complex inputs where the natural…

Biomolecules · Quantitative Biology 2024-03-14 Le Zhuo , Zewen Chi , Minghao Xu , Heyan Huang , Heqi Zheng , Conghui He , Xian-Ling Mao , Wentao Zhang

Recently, diffusion models have excelled in image generation tasks and have also been applied to neural language processing (NLP) for controllable text generation. However, the application of diffusion models in a cross-lingual setting is…

Computation and Language · Computer Science 2023-08-01 Linyao Chen , Aosong Feng , Boming Yang , Zihui Li

The conditional generation of proteins with desired functions is a key goal for generative models. Existing methods based on prompting of protein language models (PLMs) can generate proteins conditioned on a target functionality, such as a…

Biomolecules · Quantitative Biology 2025-06-13 Jason Yang , Aadyot Bhatnagar , Jeffrey A. Ruffolo , Ali Madani

Proteins are shaped by gradual evolution under biophysical and functional constraints. Protein language models learn rich evolutionary constraints from large-scale sequences, and discrete diffusion-based protein language models~(\eg, DPLMs)…

Machine Learning · Computer Science 2026-05-14 Xinyou Wang , Liang Hong , Jiasheng Ye , Zaixiang Zheng , Yu Li , Shujian Huang , Quanquan Gu

Protein language models (PLMs) have shown promise in improving the understanding of protein sequences, contributing to advances in areas such as function prediction and protein engineering. However, training these models from scratch…

Machine Learning · Computer Science 2024-12-19 Shivasankaran Vanaja Pandi , Bharath Ramsundar

This paper demonstrates that language models are strong structure-based protein designers. We present LM-Design, a generic approach to reprogramming sequence-based protein language models (pLMs), that have learned massive sequential…

Machine Learning · Computer Science 2023-02-10 Zaixiang Zheng , Yifan Deng , Dongyu Xue , Yi Zhou , Fei YE , Quanquan Gu

Proteins inherently possess a consistent sequence-structure duality. The abundance of protein sequence data, which can be readily represented as discrete tokens, has driven fruitful developments in protein language models (pLMs). A key…

Computational Engineering, Finance, and Science · Computer Science 2026-05-29 Yi Zhou , Haohao Qu , Yunqing Liu , Shanru Lin , Le Song , Wenqi Fan

Diffusion Language Models (DLMs) have emerged as a promising new paradigm for text generative modeling, potentially addressing limitations of autoregressive (AR) models. However, current DLMs have been studied at a smaller scale compared to…

Computation and Language · Computer Science 2025-06-03 Shansan Gong , Shivam Agarwal , Yizhe Zhang , Jiacheng Ye , Lin Zheng , Mukai Li , Chenxin An , Peilin Zhao , Wei Bi , Jiawei Han , Hao Peng , Lingpeng Kong

Current SMILES-based diffusion models for molecule generation typically support only unimodal constraint. They inject conditioning signals at the start of the training process and require retraining a new model from scratch whenever the…

Machine Learning · Computer Science 2025-08-21 Yunzhe Zhang , Yifei Wang , Khanh Vinh Nguyen , Pengyu Hong

Recent advances in Protein Language Models (PLMs) have transformed protein engineering, yet unlike their counterparts in Natural Language Processing (NLP), current PLMs exhibit a fundamental limitation: they excel in either Protein Language…

Computational Engineering, Finance, and Science · Computer Science 2025-09-16 Liuzhenghao Lv , Zongying Lin , Hao Li , Yuyang Liu , Jiaxi Cui , Calvin Yu-Chian Chen , Li Yuan , Yonghong Tian

Protein sequence design has seen significant advances through discrete diffusion and autoregressive approaches, yet the potential of continuous diffusion remains underexplored. Here, we present DiMA, a latent diffusion framework that…

Diffusion Language models (DLMs) are a promising avenue for text generation due to their practical properties on tractable controllable generation. They also have the advantage of not having to predict text autoregressively. However,…

Machine Learning · Computer Science 2024-02-13 Sofia Maria Lo Cicero Vaina , Nikita Balagansky , Daniil Gavrilov

Protein is linked to almost every life process. Therefore, analyzing the biological structure and property of protein sequences is critical to the exploration of life, as well as disease detection and drug discovery. Traditional protein…

Machine Learning · Computer Science 2021-12-08 Yijia Xiao , Jiezhong Qiu , Ziang Li , Chang-Yu Hsieh , Jie Tang

Protein language models (pLMs) pre-trained on vast protein sequence databases excel at various downstream tasks but often lack the structural knowledge essential for some biological applications. To address this, we introduce a method to…

Latent diffusion models offer an attractive alternative to discrete diffusion for non-autoregressive text generation by operating on continuous text representations and denoising entire sequences in parallel. The major challenge in latent…

Computation and Language · Computer Science 2026-05-11 Viacheslav Meshchaninov , Alexander Shabalin , Egor Chimbulatov , Nikita Gushchin , Ilya Koziev , Alexander Korotin , Dmitry Vetrov
‹ Prev 1 2 3 10 Next ›