Related papers: Nucleotide String Indexing using Range Matching

Efficient Neural Ranking using Forward Indexes and Lightweight Encoders

Dual-encoder-based dense retrieval models have become the standard in IR. They employ large Transformer-based language models, which are notoriously inefficient in terms of resources and latency. We propose Fast-Forward indexes -- vector…

Information Retrieval · Computer Science 2023-11-03 Jurek Leonhardt , Henrik Müller , Koustav Rudra , Megha Khosla , Abhijit Anand , Avishek Anand

E2FM: an encrypted and compressed full-text index for collections of genomic sequences

Next Generation Sequencing (NGS) platforms and, more generally, high-throughput technologies are giving rise to an exponential growth in the size of nucleotide sequence databases. Moreover, many emerging applications of nucleotide datasets…

Databases · Computer Science 2019-10-11 Ferdinando Montecuollo , Giovannni Schmid , Roberto Tagliaferri

Efficient Neural Ranking using Forward Indexes

Neural document ranking approaches, specifically transformer models, have achieved impressive gains in ranking performance. However, query processing using such over-parameterized models is both resource and time intensive. In this paper,…

Information Retrieval · Computer Science 2022-04-05 Jurek Leonhardt , Koustav Rudra , Megha Khosla , Abhijit Anand , Avishek Anand

FindeR: Accelerating FM-Index-based Exact Pattern Matching in Genomic Sequences through ReRAM technology

Genomics is the critical key to enabling precision medicine, ensuring global food security and enforcing wildlife conservation. The massive genomic data produced by various genome sequencing technologies presents a significant challenge for…

Genomics · Quantitative Biology 2019-10-03 Farzaneh Zokaee , Mingzhe Zhang , Lei Jiang

Minimap2: pairwise alignment for nucleotide sequences

Motivation: Recent advances in sequencing technologies promise ultra-long reads of $\sim$100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing…

Genomics · Quantitative Biology 2018-09-17 Heng Li

SEINE: SEgment-based Indexing for NEural information retrieval

Many early neural Information Retrieval (NeurIR) methods are re-rankers that rely on a traditional first-stage retriever due to expensive query time computations. Recently, representation-based retrievers have gained much attention, which…

Information Retrieval · Computer Science 2023-11-28 Sibo Dong , Justin Goldstein , Grace Hui Yang

GateSeeder: Near-memory CPU-FPGA Acceleration of Short and Long Read Mapping

Motivation: Read mapping is a computationally expensive process and a major bottleneck in genomics analyses. The performance of read mapping is mainly limited by the performance of three key computational steps: Index Querying, Seed…

Hardware Architecture · Computer Science 2023-10-02 Julien Eudine , Mohammed Alser , Gagandeep Singh , Can Alkan , Onur Mutlu

A Computational Approach to Packet Classification

Multi-field packet classification is a crucial component in modern software-defined data center networks. To achieve high throughput and low latency, state-of-the-art algorithms strive to fit the rule lookup data structures into on-die…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-14 Alon Rashelbach , Ori Rottenstreich , Mark Silberstein

dnaGrinder: a lightweight and high-capacity genomic foundation model

The task of understanding and interpreting the complex information encoded within genomic sequences remains a grand challenge in biological research and clinical applications. In this context, recent advancements in large language model…

Genomics · Quantitative Biology 2024-09-25 Qihang Zhao , Chi Zhang , Weixiong Zhang

GateKeeper: A New Hardware Architecture for Accelerating Pre-Alignment in DNA Short Read Mapping

Motivation: High throughput DNA sequencing (HTS) technologies generate an excessive number of small DNA segments -- called short reads -- that cause significant computational burden. To analyze the entire genome, each of the billions of…

Genomics · Quantitative Biology 2020-09-29 Mohammed Alser , Hasan Hassan , Hongyi Xin , Oğuz Ergin , Onur Mutlu , Can Alkan

GenASM: A High-Performance, Low-Power Approximate String Matching Acceleration Framework for Genome Sequence Analysis

Genome sequence analysis has enabled significant advancements in medical and scientific areas such as personalized medicine, outbreak tracing, and the understanding of evolution. Unfortunately, it is currently bottlenecked by the…

Hardware Architecture · Computer Science 2020-09-17 Damla Senol Cali , Gurpreet S. Kalsi , Zülal Bingöl , Can Firtina , Lavanya Subramanian , Jeremie S. Kim , Rachata Ausavarungnirun , Mohammed Alser , Juan Gomez-Luna , Amirali Boroumand , Anant Nori , Allison Scibisz , Sreenivas Subramoney , Can Alkan , Saugata Ghose , Onur Mutlu

FastRemap: A Tool for Quickly Remapping Reads between Genome Assemblies

A genome read data set can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly-used CrossMap tool.…

Genomics · Quantitative Biology 2023-11-21 Jeremie S. Kim , Can Firtina , Meryem Banu Cavlak , Damla Senol Cali , Can Alkan , Onur Mutlu

Read Mapping Near Non-Volatile Memory

DNA sequencing is the physical/biochemical process of identifying the location of the four bases (Adenine, Guanine, Cytosine, Thymine) in a DNA strand. As semiconductor technology revolutionized computing, modern DNA sequencing technology…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-06 S. Karen Khatamifard , Zamshed Chowdhury , Nakul Pande , Meisam Razaviyayn , Chris Kim , Ulya R. Karpuzcu

GPU-RMQ: Accelerating Range Minimum Queries on Modern GPUs

Range minimum queries are frequently used in string processing and database applications including biological sequence analysis, document retrieval, and web search. Hence, various data structures have been proposed for improving their…

Databases · Computer Science 2026-04-03 Lara Kreis , Justus Henneberg , Valentin Henkys , Felix Schuhknecht , Bertil Schmidt

GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping

Genome sequencing has become a central focus in computational biology. A genome study typically begins with sequencing, which produces millions to billions of short DNA fragments known as reads. Read mapping aligns these reads to a…

Hardware Architecture · Computer Science 2026-01-28 Julien Eudine , Chu Li , Zhuo Cheng , Renzo Andri , Can Firtina , Mohammad Sadrosadati , Nika Mansouri Ghiasi , Konstantina Koliogeorgi , Anirban Nag , Arash Tavakkol , Haiyu Mao , Onur Mutlu , Shai Bergman , Ji Zhang

Searching by index for similar sequences: the SEQR algorithm

This paper describes a method to efficiently retrieve protein database sequences similar to a query sequence, while allowing for significant numbers of mutations. We call this method SEQR for SEQuence Retrieval. This approach increases the…

Genomics · Quantitative Biology 2018-11-05 David I. Hurwitz , Lianyi Han , Lewis Y. Geer

GenStore: A High-Performance and Energy-Efficient In-Storage Computing System for Genome Sequence Analysis

Read mapping is a fundamental, yet computationally-expensive step in many genomics applications. It is used to identify potential matches and differences between fragments (called reads) of a sequenced genome and an already known genome…

Hardware Architecture · Computer Science 2023-04-07 Nika Mansouri Ghiasi , Jisung Park , Harun Mustafa , Jeremie Kim , Ataberk Olgun , Arvid Gollwitzer , Damla Senol Cali , Can Firtina , Haiyu Mao , Nour Almadhoun Alserr , Rachata Ausavarungnirun , Nandita Vijaykumar , Mohammed Alser , Onur Mutlu

Fast Packed String Matching for Short Patterns

Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like natural language processing, information retrieval and computational biology. In the last two…

Information Retrieval · Computer Science 2012-10-01 Simone Faro , M. Oguzhan Külekci

Indexing Schemes for Similarity Search In Datasets of Short Protein Fragments

We propose a family of very efficient hierarchical indexing schemes for ungapped, score matrix-based similarity search in large datasets of short (4-12 amino acid) protein fragments. This type of similarity search has importance in both…

Data Structures and Algorithms · Computer Science 2007-09-04 Aleksandar Stojmirovic , Vladimir Pestov

Fast low-level pattern matching algorithm

This paper focuses on pattern matching in the DNA sequence. It was inspired by a previously reported method that proposes encoding both pattern and sequence using prime numbers. Although fast, the method is limited to rather small pattern…

Computer Vision and Pattern Recognition · Computer Science 2016-11-21 Janja Paliska Soldo , Ana Sovic Krzic , and Damir Sersic