Biomolecules
D-peptide binders targeting L-proteins have promising therapeutic potential. Despite rapid advances in machine learning-based target-conditioned peptide design, generating D-peptide binders remains largely unexplored. In this work, we show…
Proteins perform their biological functions through three-dimensional structures encoded by amino acid sequences, and ligand-binding protein co-design requires models that generate sequence-structure compatible proteins under explicit…
Proteins play a vital role in biological processes and are indispensable for living organisms. Accurate representation of proteins is crucial, especially in drug development. Recently, there has been a notable increase in interest in…
Recent advances in generative modeling show that pretrained representations can improve generation as conditioning features or alignment targets. Motivated by this, we study protein representations for predicting structures beyond…
The design of RNA molecules that interact with specific proteins is a critical challenge in experimental and computational biology. Despite recent progress in natural language modeling and deep learning-based protein design, there remains…
Proteins encode diverse functions within complex three-dimensional structures, yet most deep learning representations remain highly entangled, obscuring the biophysical signals that underlie function. Here we introduce ProtDiS, a…
Identifying enzymes that catalyze target biochemical reactions is a key step in computational enzyme discovery and biocatalyst design. Recent representation-learning methods formulate this problem as enzyme--reaction matching, where paired…
Small-molecule foundation models are typically pretrained on standalone molecular data, unlike vision and language models that often benefit from cross-modal or relational supervision. Protein-ligand co-folding provides a molecular analogue…
While experiments and computer simulations to study biological phenomena are usually performed in diluted in vitro conditions, such phenomena happen inside the cell, an environment densely packed with diverse macromolecules. Here, we revise…
Proteins are constructed from a limited alphabet of ~20 amino acids, yet the origins and selection of this specific alphabet are unresolved. One largely overlooked aspect is whether elemental composition constrains the range of viable…
The vast chemical space of possible small molecules, estimated at 10^60 compounds for molecules composed of just C, N, O, and S, is only sparsely occupied by biology. We propose that where life selects molecules within this space…
RNA function is tied to secondary structure, operating through dynamic and heterogeneous structural ensembles. While current analysis tools typically output single static structures or averaged contact maps, chemical probing methods like…
Predicting the secondary structure of RNA is a core challenge in computational biology, essential for understanding molecular function and designing novel therapeutics. The field has evolved from foundational but accuracy-limited…
Molecules are graphs, but large language models~(LLMs) are usually asked to reason about them through linear strings. The most popular molecular representation, SMILES, compresses atoms, bonds, branches and rings into a compact sequence in…
Protein language models are increasingly used to guide experimental and clinical decisions, yet it is often unclear whether a confident prediction reflects recognition of biological evidence or retrieval of a statistical default. We examine…
The boundaries of cooperative helix--coil transitions directly affect protein allostery and conformational dynamics, yet the physical origin of the persistent one-to-two-residue assignment ambiguity at these structural interfaces remains…
Spatial transcriptomics provides an unprecedented perspective for deciphering tissue spatial heterogeneity. However, high-resolution spatial transcriptomic technology remains constrained by limited gene coverage, technical complexity, and…
NMR relaxation experiments have shown that there are small but measurable changes in the native state dynamics of the Fyn SH3 domain associated with the substitution by other amino acids of a phenylalanine residue (F20) in the hydrophobic…
Deep learning, particularly with the advancement of Large Language Models, has transformed biomolecular modeling, with protein language models such as ESM inspiring emerging RNA language models such as RiNALMo. Recent work has begun…
Protein function is driven by cohesive substructures, such as catalytic triads, binding pockets, and structural motifs, that occupy only a small fraction of a protein's residues. Yet existing pipelines built on protein encoders do not model…