English
Related papers

Related papers: Efficient seeding techniques for protein similarit…

200 papers

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We…

Quantitative Methods · Quantitative Biology 2011-01-18 Mikhail A. Roytberg , Anna Gambin , Laurent Noé , Slawomir Lasota , Eugenia Furletova , Ewa Szczurek , Gregory Kucherov

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem -- a set of target alignments, an associated…

Data Structures and Algorithms · Computer Science 2010-01-19 Gregory Kucherov , Laurent Noé , Mihkail Roytberg

We propose a general approach to compute the seed sensitivity, that can be applied to different definitions of seeds. It treats separately three components of the seed sensitivity problem - a set of target alignments, an associated…

Other Computer Science · Computer Science 2011-01-18 Gregory Kucherov , Laurent Noe , Mikhail Roytberg

We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt…

Quantitative Methods · Quantitative Biology 2011-01-18 Gregory Kucherov , Laurent Noé , Mikhail A. Roytberg

Generating the hash values of short subsequences, called seeds, enables quickly identifying similarities between genomic sequences by matching seeds with a single lookup of their hash values. However, these hash values can be used only for…

While modern biotechnologies allow synthesizing new proteins and function measurements at scale, efficiently exploring a protein sequence space and engineering it remains a daunting task due to the vast sequence space of any given protein.…

Biomolecules · Quantitative Biology 2024-01-15 Jiahao Qiu , Hui Yuan , Jinghong Zhang , Wentao Chen , Huazheng Wang , Mengdi Wang

The advent of high-throughput sequencing technologies constituted a major advance in genomic studies, offering new prospects in a wide range of applications. We propose a rigorous and flexible algorithmic solution to mapping SOLiD…

Quantitative Methods · Quantitative Biology 2011-01-18 Laurent Noé , Marta L. Gîrdea , Gregory Kucherov

This paper describes a method to efficiently retrieve protein database sequences similar to a query sequence, while allowing for significant numbers of mutations. We call this method SEQR for SEQuence Retrieval. This approach increases the…

Genomics · Quantitative Biology 2018-11-05 David I. Hurwitz , Lianyi Han , Lewis Y. Geer

Randomised algorithms often employ methods that can fail and that are retried with independent randomness until they succeed. Randomised data structures therefore often store indices of successful attempts, called seeds. If $n$ such seeds…

Data Structures and Algorithms · Computer Science 2025-07-03 Hans-Peter Lehmann , Peter Sanders , Stefan Walzer , Jonatan Ziegler

Protein similarity searches are a routine job for molecular biologists where a query sequence of amino acids needs to be compared and ranked against an ever-growing database of proteins. All available algorithms in this field can be grouped…

Computational Engineering, Finance, and Science · Computer Science 2015-08-27 Akash Nag , Sunil Karforma

We propose a family of very efficient hierarchical indexing schemes for ungapped, score matrix-based similarity search in large datasets of short (4-12 amino acid) protein fragments. This type of similarity search has importance in both…

Data Structures and Algorithms · Computer Science 2007-09-04 Aleksandar Stojmirovic , Vladimir Pestov

Screening or assessing studies is critical to the quality and outcomes of a systematic review. Typically, a Boolean query retrieves the set of studies to screen. As the set of studies retrieved is unordered, screening all retrieved studies…

Information Retrieval · Computer Science 2021-12-09 Shuai Wang , Harrisen Scells , Ahmed Mourad , Guido Zuccon

Several algorithms for similarity search employ seeding techniques to quickly discard very dissimilar regions. In this paper, we study theoretical properties of lossless seeds, i.e., spaced seeds having full sensitivity. We prove that…

Discrete Mathematics · Computer Science 2014-05-23 Karel Břinda

We study the pattern matching automaton introduced in (A unifying framework for seed sensitivity and its application to subset seeds) for the purpose of seed-based similarity search. We show that our definition provides a compact automaton,…

Formal Languages and Automata Theory · Computer Science 2014-08-27 Gregory Kucherov , Laurent Noé , Mikhail Roytberg

The exponential growth of DNA sequencing data has outpaced traditional heuristic-based methods, which struggle to scale effectively. Efficient computational approaches are urgently needed to support large-scale similarity search, a…

Suffix trees have recently become very successful data structures in handling large data sequences such as DNA or Protein sequences. Consequently parallel architectures have become ubiquitous. We present a novel alphabet-dependent parallel…

Data Structures and Algorithms · Computer Science 2017-04-20 Freeson Kaniwa , Venu Madhav Kuthadi , Otlhapile Dinakenyane , Heiko Schroeder

The Basic Local Alignment Search Tool (BLAST) is currently the most popular method for searching databases of biological sequences. BLAST compares sequences via similarity defined by a weighted edit distance, which results in it being…

Biomolecules · Quantitative Biology 2020-10-29 Amir Shanehsazzadeh , David Belanger , David Dohan

Recent advances in weakly supervised text classification mostly focus on designing sophisticated methods to turn high-level human heuristics into quality pseudo-labels. In this paper, we revisit the seed matching-based method, which is…

Computation and Language · Computer Science 2023-10-24 Chengyu Dong , Zihan Wang , Jingbo Shang

Set similarity search is a problem of central interest to a wide variety of applications such as data cleaning and web search. Past approaches on set similarity search utilize either heavy indexing structures, incurring large search costs…

Databases · Computer Science 2021-07-23 Yifan Li , Xiaohui Yu , Nick Koudas

As the structural databases continue to expand, efficient methods are required to search similar structures of the query structure from the database. There are many previous works about comparing protein 3D structures and scanning the…

Databases · Computer Science 2011-02-16 Gook-Pil Roh , Seung-won Hwang , Byoung-Kee Yi
‹ Prev 1 2 3 10 Next ›