English
Related papers

Related papers: Adaptive Protein Tokenization

200 papers

Protein structure tokenization converts 3D structures into discrete or vectorized representations, enabling the integration of structural and sequence data. Despite many recent works on structure tokenization, the properties of the…

Machine Learning · Computer Science 2025-11-14 Zijing Liu , Bin Feng , He Cao , Yu Li

Recent years have witnessed a surge in the development of protein structural tokenization methods, which chunk protein 3D structures into discrete or continuous representations. Structure tokenization enables the direct application of…

Quantitative Methods · Quantitative Biology 2025-06-26 Xinyu Yuan , Zichen Wang , Marcus Collins , Huzefa Rangwala

We propose task-adaptive tokenization as a way to adapt the generation pipeline to the specifics of a downstream task and enhance long-form generation in mental health. Inspired by insights from cognitive science, our task-adaptive…

Computation and Language · Computer Science 2023-11-14 Siyang Liu , Naihao Deng , Sahand Sabour , Yilin Jia , Minlie Huang , Rada Mihalcea

Adaptive cognition requires structured internal models of objects and their relations. Predictive neural networks are often proposed to learn such world models, but how these are instantiated and how they support prediction remain unclear.…

Machine Learning · Computer Science 2026-05-11 Linda Ariel Ventura , Victoria Bosch , Tim C Kietzmann , Sushrut Thorat

Modeling genomic sequences faces two unsolved challenges: the information density varies widely across different regions, while there is no clearly defined minimum vocabulary unit. Relying on either four primitive bases or independently…

Genomics · Quantitative Biology 2025-11-20 Siyuan Li , Kai Yu , Anna Wang , Zicheng Liu , Chang Yu , Jingbo Zhou , Qirong Yang , Yucheng Guo , Xiaoming Zhang , Stan Z. Li

The diverse nature of protein prediction tasks has traditionally necessitated specialized models, hindering the development of broadly applicable and computationally efficient Protein Language Models (PLMs). In this work, we introduce…

Generative modeling has become a central paradigm in protein research, extending machine learning beyond structure prediction toward sequence design, backbone generation, inverse folding, and biomolecular interaction modeling. However, the…

Machine Learning · Computer Science 2026-03-30 Senura Hansaja Wanasekara , Minh-Duong Nguyen , Xiaochen Liu , Nguyen H. Tran , Ken-Tye Yong

Representation learning and \emph{de novo} generation of proteins are pivotal computational biology tasks. Whilst natural language processing (NLP) techniques have proven highly effective for protein sequence modelling, structure modelling…

Quantitative Methods · Quantitative Biology 2025-01-08 Benoit Gaujac , Jérémie Donà , Liviu Copoiu , Timothy Atkinson , Thomas Pierrot , Thomas D. Barrett

Multimodal models that jointly reason over protein sequences, structures, and function annotations within a unified representation hold immense potential for integrating multimodal data and generating new proteins with designed functional…

Biomolecules · Quantitative Biology 2026-05-12 Nabin Giri , Steven Farrell , Kristofer E. Bouchard

In image retrieval, deep local features learned in a data-driven manner have been demonstrated effective to improve retrieval performance. To realize efficient retrieval on large image database, some approaches quantize deep local features…

Image and Video Processing · Electrical Eng. & Systems 2021-12-14 Hui Wu , Min Wang , Wengang Zhou , Yang Hu , Houqiang Li

Protein structure tokenizers enable the creation of multimodal models of protein structure, sequence, and function. Current approaches to protein structure tokenization rely on bespoke components that are invariant to spatial symmetries,…

Machine Learning · Computer Science 2025-10-02 Rohit Dilip , Evan Zhang , Ayush Varshney , David Van Valen

Proteins are macromolecules that perform essential functions in all living organisms. Designing novel proteins with specific structures and desired functions has been a long-standing challenge in the field of bioengineering. Existing…

Biomolecules · Quantitative Biology 2023-03-03 Chence Shi , Chuanrui Wang , Jiarui Lu , Bozitao Zhong , Jian Tang

The increasing number of protein sequences decoded from genomes is opening up new avenues of research on linking protein sequence to function with transformer neural networks. Recent research has shown that the number of known protein…

Machine Learning · Computer Science 2022-06-23 Anowarul Kabir , Amarda Shehu

Effective and efficient tokenization plays an important role in image representation and generation. Conventional methods, constrained by uniform 2D/1D grid tokenization, are inflexible to represent regions with varying shapes and textures…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Zhengqiang Zhang , Rongyuan Wu , Lingchen Sun , Lei Zhang

Generative artificial intelligence models learn probability distributions from data and produce novel samples that capture the salient properties of their training sets. Proteins are particularly attractive for such approaches given their…

Biomolecules · Quantitative Biology 2026-02-27 Filippo Stocco , Michele Garibbo , Noelia Ferruz

Inferring the structural properties of a protein from its amino acid sequence is a challenging yet important problem in biology. Structures are not known for the vast majority of protein sequences, but structure is critical for…

Machine Learning · Computer Science 2019-10-17 Tristan Bepler , Bonnie Berger

Recently, generative recommendation has emerged as a promising paradigm, attracting significant research attention. The basic framework involves an item tokenizer, which represents each item as a sequence of codes serving as its identifier,…

Information Retrieval · Computer Science 2025-05-27 Bowen Zheng , Hongyu Lu , Yu Chen , Wayne Xin Zhao , Ji-Rong Wen

Learning effective protein representations is critical in a variety of tasks in biology such as predicting protein function or structure. Existing approaches usually pretrain protein language models on a large number of unlabeled amino acid…

Machine Learning · Computer Science 2023-01-31 Zuobai Zhang , Minghao Xu , Arian Jamasb , Vijil Chenthamarakshan , Aurelie Lozano , Payel Das , Jian Tang

Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a…

Populations and Evolution · Quantitative Biology 2020-09-22 Michael Golden , Eduardo García-Portugués , Michael Sørensen , Kanti V. Mardia , Thomas Hamelryck , Jotun Hein

Large language models have made remarkable progress in the field of molecular science, particularly in understanding and generating functional small molecules. This success is largely attributed to the effectiveness of molecular…

Biomolecules · Quantitative Biology 2025-03-14 Zicheng Ma , Chuanliu Fan , Zhicong Wang , Zhenyu Chen , Xiaohan Lin , Yanheng Li , Shihao Feng , Jun Zhang , Ziqiang Cao , Yi Qin Gao
‹ Prev 1 2 3 10 Next ›