Related papers: Seed design framework for mapping SOLiD reads

GRIM-filter: fast seed filtering in read mapping using emerging memory technologies

Motivation: Seed filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. Read mappers 1) quickly…

Genomics · Quantitative Biology 2017-08-16 Jeremie S Kim , Damla Senol , Hongyi Xin , Donghyuk Lee , Saugata Ghose , Mohammed Alser , Hasan Hassan , Oguz Ergin , Can Alkan , Onur Mutlu

GRIM-Filter: Fast Seed Location Filtering in DNA Read Mapping Using Processing-in-Memory Technologies

Motivation: Seed location filtering is critical in DNA read mapping, a process where billions of DNA fragments (reads) sampled from a donor are mapped onto a reference genome to identify genomic variants of the donor. State-of-the-art read…

Genomics · Quantitative Biology 2020-04-21 Jeremie S. Kim , Damla Senol Cali , Hongyi Xin , Donghyuk Lee , Saugata Ghose , Mohammed Alser , Hasan Hassan , Oguz Ergin , Can Alkan , Onur Mutlu

Efficient seeding techniques for protein similarity search

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We…

Quantitative Methods · Quantitative Biology 2008-10-31 Mihkail Roytberg , Anna Gambin , Laurent Noé , Slawomir Lasota , Eugenia Furletova , Ewa Szczurek , Gregory Kucherov

MOSAIK: A hash-based algorithm for accurate next-generation sequencing read mapping

This paper presents an accurate short-read mapper for next-generation sequencing data which is widely used in the 1000 Genomes Project, and human clinical and other species genome studies.

Genomics · Quantitative Biology 2015-06-17 Wan-Ping Lee , Michael Stromberg , Alistair Ward , Chip Stewart , Erik Garrison , Gabor T. Marth

SeedGNN: Graph Neural Networks for Supervised Seeded Graph Matching

There is a growing interest in designing Graph Neural Networks (GNNs) for seeded graph matching, which aims to match two unlabeled graphs using only topological information and a small set of seed nodes. However, most previous GNNs for this…

Machine Learning · Computer Science 2023-07-11 Liren Yu , Jiaming Xu , Xiaojun Lin

Reliable algorithm selection for machine learning-guided design

Algorithms for machine learning-guided design, or design algorithms, use machine learning-based predictions to propose novel objects with desired property values. Given a new design task -- for example, to design novel proteins with high…

Machine Learning · Computer Science 2025-07-04 Clara Fannjiang , Ji Won Park

Learning to Match Features with Seeded Graph Matching Network

Matching local features across images is a fundamental problem in computer vision. Targeting towards high accuracy and efficiency, we propose Seeded Graph Matching Network, a graph neural network with sparse structure to reduce redundant…

Computer Vision and Pattern Recognition · Computer Science 2021-08-20 Hongkai Chen , Zixin Luo , Jiahui Zhang , Lei Zhou , Xuyang Bai , Zeyu Hu , Chiew-Lan Tai , Long Quan

The design of selection experiments using a model-based approach

Plant breeding programs use data obtained from multi-environment selection experiments to produce improved varieties with the ultimate aim of maintaining high levels of genetic gain. Selection accuracy can be improved with the use of…

Methodology · Statistics 2026-05-13 Brian R Cullis , Alison B Smith , David GD Hughes , David Butler

Read classification using semi-supervised deep learning

In this paper, we propose a semi-supervised deep learning method for detecting the specific types of reads that impede the de novo genome assembly process. Instead of dealing directly with sequenced reads, we analyze their coverage graphs…

Machine Learning · Computer Science 2019-04-24 Tomislav Šebrek , Jan Tomljanović , Josip Krapac , Mile Šikić

Iterative Learning for Reference-Guided DNA Sequence Assembly from Short Reads: Algorithms and Limits of Performance

Recent emergence of next-generation DNA sequencing technology has enabled acquisition of genetic information at unprecedented scales. In order to determine the genetic blueprint of an organism, sequencing platforms typically employ…

Genomics · Quantitative Biology 2015-06-19 Xiaohu Shen , Manohar Shamaiah , Haris Vikalo

A Framework for Discovering Optimal Solutions in Photonic Inverse Design

Photonic inverse design has emerged as an indispensable engineering tool for complex optical systems. In many instances it is important to optimize for both material and geometry configurations, which results in complex non-smooth search…

Optics · Physics 2021-06-17 Jagrit Digani , Phillip Hon , Artur R. Davoyan

On subset seeds for protein alignment

We apply the concept of subset seeds proposed in [1] to similarity search in protein sequences. The main question studied is the design of efficient seed alphabets to construct seeds with optimal sensitivity/selectivity trade-offs. We…

Quantitative Methods · Quantitative Biology 2011-01-18 Mikhail A. Roytberg , Anna Gambin , Laurent Noé , Slawomir Lasota , Eugenia Furletova , Ewa Szczurek , Gregory Kucherov

Symbolic Regression via Neural-Guided Genetic Programming Population Seeding

Symbolic regression is the process of identifying mathematical expressions that fit observed output from a black-box process. It is a discrete optimization problem generally believed to be NP-hard. Prior approaches to solving the problem…

Neural and Evolutionary Computing · Computer Science 2021-11-19 T. Nathan Mundhenk , Mikel Landajuela , Ruben Glatt , Claudio P. Santiago , Daniel M. Faissol , Brenden K. Petersen

Assembly of repetitive regions using next-generation sequencing data

High read depth can be used to assemble short sequence repeats. The existing genome assemblers fail in repetitive regions of longer than average read. I propose a new algorithm for a DNA assembly which uses the relative frequency of reads…

Genomics · Quantitative Biology 2015-01-08 Robert M. Nowak

Multi-Objective Genetic Programming for Manifold Learning: Balancing Quality and Dimensionality

Manifold learning techniques have become increasingly valuable as data continues to grow in size. By discovering a lower-dimensional representation (embedding) of the structure of a dataset, manifold learning algorithms can substantially…

Neural and Evolutionary Computing · Computer Science 2020-01-31 Andrew Lensen , Mengjie Zhang , Bing Xue

ReadsMap: a new tool for high precision mapping of DNAseq and RNAseq read sequences

There are currently plenty of programs available for mapping short sequences (reads) to a genome. Most of them, however, including such popular and actively developed programs as Bowtie, BWA, TopHat and many others, are based on…

Genomics · Quantitative Biology 2019-08-06 Igor Seledtsov , Jaroslav Efremov , Vladimir Molodtsov , Victor Solovyev

A read-filtering algorithm for high-throughput marker-gene studies that greatly improves OTU accuracy

Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational…

Quantitative Methods · Quantitative Biology 2015-06-02 Fernando Puente-Sánchez , Jacobo Aguirre , Víctor Parro

Genetic Micro-Programs for Automated Software Testing with Large Path Coverage

Ongoing progress in computational intelligence (CI) has led to an increased desire to apply CI techniques for the purpose of improving software engineering processes, particularly software testing. Existing state-of-the-art automated…

Neural and Evolutionary Computing · Computer Science 2023-02-16 Jarrod Goschen , Anna Sergeevna Bosman , Stefan Gruner

Gene finding revisited: improved robustness through structured decoding from learned embeddings

Gene finding is the task of identifying the locations of coding sequences within the vast amount of genetic code contained in the genome. With an ever increasing quantity of raw genome sequences, gene finding is an important avenue towards…

Genomics · Quantitative Biology 2025-05-07 Frederikke I. Marin , Dennis Pultz , Wouter Boomsma

Multiseed Lossless Filtration

We study a method of seed-based lossless filtration for approximate string matching and related bioinformatics applications. The method is based on a simultaneous use of several spaced seeds rather than a single seed as studied by Burkhardt…

Quantitative Methods · Quantitative Biology 2011-01-18 Gregory Kucherov , Laurent Noé , Mikhail A. Roytberg