English
Related papers

Related papers: Nucleotide String Indexing using Range Matching

200 papers

Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transformer-based language models, which are notoriously inefficient in terms of resources and latency. We propose Fast-Forward indexes -- vector…

Information Retrieval · Computer Science 2023-11-03 Jurek Leonhardt , Henrik Müller , Koustav Rudra , Megha Khosla , Abhijit Anand , Avishek Anand

Next Generation Sequencing (NGS) platforms and, more generally, high-throughput technologies are giving rise to an exponential growth in the size of nucleotide sequence databases. Moreover, many emerging applications of nucleotide datasets…

Databases · Computer Science 2019-10-11 Ferdinando Montecuollo , Giovannni Schmid , Roberto Tagliaferri

Neural document ranking approaches, specifically transformer models, have achieved impressive gains in ranking performance. However, query processing using such over-parameterized models is both resource and time intensive. In this paper,…

Information Retrieval · Computer Science 2022-04-05 Jurek Leonhardt , Koustav Rudra , Megha Khosla , Abhijit Anand , Avishek Anand

Genomics is the critical key to enabling precision medicine, ensuring global food security and enforcing wildlife conservation. The massive genomic data produced by various genome sequencing technologies presents a significant challenge for…

Genomics · Quantitative Biology 2019-10-03 Farzaneh Zokaee , Mingzhe Zhang , Lei Jiang

Motivation: Recent advances in sequencing technologies promise ultra-long reads of $\sim$100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing…

Genomics · Quantitative Biology 2018-09-17 Heng Li

Many early neural Information Retrieval (NeurIR) methods are re-rankers that rely on a traditional first-stage retriever due to expensive query time computations. Recently, representation-based retrievers have gained much attention, which…

Information Retrieval · Computer Science 2023-11-28 Sibo Dong , Justin Goldstein , Grace Hui Yang

Motivation: Read mapping is a computationally expensive process and a major bottleneck in genomics analyses. The performance of read mapping is mainly limited by the performance of three key computational steps: Index Querying, Seed…

Hardware Architecture · Computer Science 2023-10-02 Julien Eudine , Mohammed Alser , Gagandeep Singh , Can Alkan , Onur Mutlu

Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-14 Alon Rashelbach , Ori Rottenstreich , Mark Silberstein

The task of understanding and interpreting the complex information encoded within genomic sequences remains a grand challenge in biological research and clinical applications. In this context, recent advancements in large language model…

Genomics · Quantitative Biology 2024-09-25 Qihang Zhao , Chi Zhang , Weixiong Zhang

Motivation: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -- called short reads -- that cause significant computational burden. To analyze the entire genome, each of the billions of…

Genomics · Quantitative Biology 2020-09-29 Mohammed Alser , Hasan Hassan , Hongyi Xin , Oğuz Ergin , Onur Mutlu , Can Alkan

Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. Unfortunately, it is currently bottlenecked by the…

A genome read data set can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly-used CrossMap tool.…

Genomics · Quantitative Biology 2023-11-21 Jeremie S. Kim , Can Firtina , Meryem Banu Cavlak , Damla Senol Cali , Can Alkan , Onur Mutlu

DNA sequencing is the physical/biochemical process of identifying the location of the four bases (Adenine, Guanine, Cytosine, Thymine) in a DNA strand. As semiconductor technology revolutionized computing, modern DNA sequencing technology…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-06 S. Karen Khatamifard , Zamshed Chowdhury , Nakul Pande , Meisam Razaviyayn , Chris Kim , Ulya R. Karpuzcu

Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search. Hence, various data structures have been proposed for improving their…

Databases · Computer Science 2026-04-03 Lara Kreis , Justus Henneberg , Valentin Henkys , Felix Schuhknecht , Bertil Schmidt

Genome sequencing has become a central focus in computational biology. A genome study typically begins with sequencing, which produces millions to billions of short DNA fragments known as reads. Read mapping aligns these reads to a…

This paper describes a method to efficiently retrieve protein database sequences similar to a query sequence, while allowing for significant numbers of mutations. We call this method SEQR for SEQuence Retrieval. This approach increases the…

Genomics · Quantitative Biology 2018-11-05 David I. Hurwitz , Lianyi Han , Lewis Y. Geer

Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome…

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two…

Information Retrieval · Computer Science 2012-10-01 Simone Faro , M. Oguzhan Külekci

We propose a family of very efficient hierarchical indexing schemes for ungapped, score matrix-based similarity search in large datasets of short (4-12 amino acid) protein fragments. This type of similarity search has importance in both…

Data Structures and Algorithms · Computer Science 2007-09-04 Aleksandar Stojmirovic , Vladimir Pestov

This paper focuses on pattern matching in the DNA sequence. It was inspired by a previously reported method that proposes encoding both pattern and sequence using prime numbers. Although fast, the method is limited to rather small pattern…

Computer Vision and Pattern Recognition · Computer Science 2016-11-21 Janja Paliska Soldo , Ana Sovic Krzic , and Damir Sersic
‹ Prev 1 2 3 10 Next ›