English
Related papers

Related papers: A PLMs based protein retrieval framework

200 papers

The Basic Local Alignment Search Tool (BLAST) is currently the most popular method for searching databases of biological sequences. BLAST compares sequences via similarity defined by a weighted edit distance, which results in it being…

Biomolecules · Quantitative Biology 2020-10-29 Amir Shanehsazzadeh , David Belanger , David Dohan

Protein language models (PLMs) have shown promise in improving the understanding of protein sequences, contributing to advances in areas such as function prediction and protein engineering. However, training these models from scratch…

Machine Learning · Computer Science 2024-12-19 Shivasankaran Vanaja Pandi , Bharath Ramsundar

In recent years, protein-text models have gained significant attention for their potential in protein generation and understanding. Current approaches focus on integrating protein-related knowledge into large language models through…

Computation and Language · Computer Science 2025-11-11 Juntong Wu , Zijing Liu , He Cao , Hao Li , Bin Feng , Zishan Shu , Ke Yu , Li Yuan , Yu Li

Protein sequence design, determined by amino acid sequences, are essential to protein engineering problems in drug discovery. Prior approaches have resorted to evolutionary strategies or Monte-Carlo methods for protein design, but often…

Protein language models (pLMs) pre-trained on vast protein sequence databases excel at various downstream tasks but often lack the structural knowledge essential for some biological applications. To address this, we introduce a method to…

This paper aims to retrieve proteins with similar structures and semantics from large-scale protein dataset, facilitating the functional interpretation of protein structures derived by structural determination methods like cryo-Electron…

Biomolecules · Quantitative Biology 2025-06-11 Qifeng Wu , Zhengzhe Liu , Han Zhu , Yizhou Zhao , Daisuke Kihara , Min Xu

With the exponential increase of the protein sequence databases over time, multiple-sequence alignment (MSA) methods, like PSI-BLAST, perform exhaustive and time-consuming database search to retrieve evolutionary information. The resulting…

Quantitative Methods · Quantitative Biology 2023-08-21 Issar Arab

Protein sequences are abundant in repeating segments, both as exact copies and as approximate segments with mutations. These repeats are important for protein structure and function, motivating decades of algorithmic work on repeat…

Machine Learning · Computer Science 2026-05-26 Gal Pomerants , Yaniv Nikankin , Anja Reusch , Tomer Tsaban , Ora Schueler-Furman , Yonatan Belinkov

Protein similarity searches are a routine job for molecular biologists where a query sequence of amino acids needs to be compared and ranked against an ever-growing database of proteins. All available algorithms in this field can be grouped…

Computational Engineering, Finance, and Science · Computer Science 2015-08-27 Akash Nag , Sunil Karforma

The prediction of protein structures from sequences is an important task for function prediction, drug design, and related biological processes understanding. Recent advances have proved the power of language models (LMs) in processing the…

Quantitative Methods · Quantitative Biology 2022-12-01 Bozhen Hu , Jun Xia , Jiangbin Zheng , Cheng Tan , Yufei Huang , Yongjie Xu , Stan Z. Li

This paper describes a method to efficiently retrieve protein database sequences similar to a query sequence, while allowing for significant numbers of mutations. We call this method SEQR for SEQuence Retrieval. This approach increases the…

Genomics · Quantitative Biology 2018-11-05 David I. Hurwitz , Lianyi Han , Lewis Y. Geer

The exponential growth of DNA sequencing data has outpaced traditional heuristic-based methods, which struggle to scale effectively. Efficient computational approaches are urgently needed to support large-scale similarity search, a…

Identification and alignment of three-dimensional folding of proteins may yield useful information about relationships too remote to be detected by conventional methods, such as sequence comparison, and may potentially lead to prediction of…

Quantitative Methods · Quantitative Biology 2017-01-10 Barış Ekim

The rapid growth of video content demands efficient and precise retrieval systems. While vision-language models (VLMs) excel in representation learning, they often struggle with adaptive, time-sensitive video retrieval. This paper…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Yicheng Duan , Xi Huang , Duo Chen

Text retrieval is a long-standing research topic on information seeking, where a system is required to return relevant information resources to user's queries in natural language. From classic retrieval methods to learning-based ranking…

Information Retrieval · Computer Science 2022-11-29 Wayne Xin Zhao , Jing Liu , Ruiyang Ren , Ji-Rong Wen

Supervised fine-tuning (SFT) is a standard approach for adapting large language models to specialized domains, yet its application to protein sequence modeling and protein language models (PLMs) remains ad hoc. This is in part because…

Machine Learning · Computer Science 2025-12-11 Amin Tavakoli , Raswanth Murugan , Ozan Gokdemir , Arvind Ramanathan , Frances Arnold , Anima Anandkumar

Modern Protein Language Models (PLMs) apply transformer-based model architectures from natural language processing to biological sequences, predicting a variety of protein functions and properties. However, protein language has key…

Machine Learning · Computer Science 2026-02-25 Anna Hart , Chi Han , Jeonghwan Kim , Huimin Zhao , Heng Ji

Large language models (LLMs) have shown promise in various natural language processing tasks, including their application to proteomics data to classify protein fragments. In this study, we curated a limited mass spectrometry dataset with…

Quantitative Methods · Quantitative Biology 2025-02-28 Taylor A Phillips , Alejandro W. Huskey , Patrick T. Huskey , Seth L. Robia , Peter M. Kekenes-Huskey

We study the problem of efficiently clustering protein sequences in a limited information setting. We assume that we do not know the distances between the sequences in advance, and must query them during the execution of the algorithm. Our…

Data Structures and Algorithms · Computer Science 2015-03-17 Konstantin Voevodski , Maria-Florina Balcan , Heiko Roglin , Shang-Hua Teng , Yu Xia

Traditional sparse and dense retrieval methods struggle to leverage general world knowledge and often fail to capture the nuanced features of queries and products. With the advent of large language models (LLMs), industrial search systems…

Information Retrieval · Computer Science 2025-07-14 Ming Pang , Chunyuan Yuan , Xiaoyu He , Zheng Fang , Donghao Xie , Fanyi Qu , Xue Jiang , Changping Peng , Zhangang Lin , Ching Law , Jingping Shao
‹ Prev 1 2 3 10 Next ›