Related papers: Highly Scalable Algorithms for Robust String Barco…

A Comparative Study on String Matching Algorithm of Biological Sequences

String matching algorithm plays the vital role in the Computational Biology. The functional and structural relationship of the biological sequence is determined by similarities on that sequence. For that, the researcher is supposed to aware…

Data Structures and Algorithms · Computer Science 2014-01-30 Pandiselvam. P , Marimuthu. T , Lawrance. R

Robust Indexing for the Sliced Channel: Almost Optimal Codes for Substitutions and Deletions

Encoding data as a set of unordered strings is receiving great attention as it captures one of the basic features of DNA storage systems. However, the challenge of constructing optimal redundancy codes for this channel remained elusive. In…

Information Theory · Computer Science 2023-08-16 Jin Sima , Netanel Raviv , Jehoshua Bruck

Scalable Prototype Selection by Genetic Algorithms and Hashing

Classification in the dissimilarity space has become a very active research area since it provides a possibility to learn from data given in the form of pairwise non-metric dissimilarities, which otherwise would be difficult to cope with.…

Machine Learning · Statistics 2017-12-27 Yenisel Plasencia-Calaña , Mauricio Orozco-Alzate , Heydi Méndez-Vázquez , Edel García-Reyes , Robert P. W. Duin

Exhaustive Exact String Matching: The Analysis of the Full Human Genome

Exact string matching has been a fundamental problem in computer science for decades because of many practical applications. Some are related to common procedures, such as searching in files and text editors, or, more recently, to more…

Data Structures and Algorithms · Computer Science 2019-07-29 Konstantinos F. Xylogiannopoulos

Computational tools for the multiscale analysis of Hi-C data in bacterial chromosomes

Just as in eukaryotes, high-throughput chromosome conformation capture (Hi-C) data have revealed nested organizations of bacterial chromosomes into overlapping interaction domains. In this chapter, we present a multiscale analysis framework…

Genomics · Quantitative Biology 2020-10-06 Nelle Varoquaux , Virginia S. Lioy , Frédéric Boccard , Ivan Junier

High Performance Multiple Sequence Alignment Algorithms for Comparison of Microbial Genomes

Advances in gene sequencing have enabled in silico analyses of microbial genomes and have led to the revision of concepts of microbial taxonomy and evolution. We explore deficiencies in existing multiple sequence global alignment algorithms…

Genomics · Quantitative Biology 2023-12-07 Manal Helal , Hossam El-Gindy , Bruno Gaeta , Vitali Sinchenko

LV Barcoding: locality sensitive hashing-based tool for rapid species identification in DNA barcoding

DNA barcoding has emerged as a cost-effective approach for species identification. However, the scarcity of tools used for searching the booming reference database becomes an obstacle, currently with BLAST as the only practical choice.…

Populations and Evolution · Quantitative Biology 2014-07-15 Long Fan , Ka Hou Chu

Efficient Online String Matching through Linked Weak Factors

Online string matching is a computational problem involving the search for patterns or substrings in a large text dataset, with the pattern and text being processed sequentially, without prior access to the entire text. Its relevance stems…

Data Structures and Algorithms · Computer Science 2023-10-25 Matthew N. Palmer , Simone Faro , Stefano Scafiti

Designing robust watermark barcodes for multiplex long-read sequencing

A method for designing sequencing barcodes that can withstand a large number of insertion, deletion and substitution errors and are suitable for use in multiplex single-molecule real-time sequencing is presented. The manuscript focuses on…

Other Computer Science · Computer Science 2016-04-06 Joaquín Ezpeleta , Flavia J. Krsticevic , Pilar Bulacio , Elizabeth Tapia

Scalable Genomics with R and Bioconductor

This paper reviews strategies for solving problems encountered when analyzing large genomic data sets and describes the implementation of those strategies in R by packages from the Bioconductor project. We treat the scalable processing,…

Genomics · Quantitative Biology 2014-09-11 Michael Lawrence , Martin Morgan

Bayesian identification of bacterial strains from sequencing data

Rapidly assaying the diversity of a bacterial species present in a sample obtained from a hospital patient or an evironmental source has become possible after recent technological advances in DNA sequencing. For several applications it is…

Genomics · Quantitative Biology 2016-02-18 Aravind Sankar , Brandon Malone , Sion Bayliss , Ben Pascoe , Guillaume Méric , Matthew D. Hitchings , Samuel K. Sheppard , Edward J. Feil , Jukka Corander , Antti Honkela

Scalable Distributed String Sorting

String sorting is an important part of tasks such as building index data structures. Unfortunately, current string sorting algorithms do not scale to massively parallel distributed-memory machines since they either have latency (at least)…

Data Structures and Algorithms · Computer Science 2024-04-26 Florian Kurpicz , Pascal Mehnert , Peter Sanders , Matthias Schimek

A Fixed-Parameter Algorithm for Minimum Common String Partition with Few Duplications

Motivated by the study of genome rearrangements, the NP-hard Minimum Common String Partition problems asks, given two strings, to split both strings into an identical set of blocks. We consider an extension of this problem to unbalanced…

Data Structures and Algorithms · Computer Science 2013-08-02 Laurent Bulteau , Guillaume Fertin , Christian Komusiewicz , Irena Rusu

Fractals from genomes: exact solutions of a biology-inspired problem

This is a review of a set of recent papers with some new data added. After a brief biological introduction a visualization scheme of the string composition of long DNA sequences, in particular, of bacterial complete genomes, will be…

Soft Condensed Matter · Physics 2009-10-31 Bai-lin Hao

Cell lineage tracing using nuclease barcoding

Lineage tracing, the determination and mapping of progeny arising from single cells, is an important approach enabling the elucidation of mechanisms underlying diverse biological processes ranging from development to disease. We developed a…

Genomics · Quantitative Biology 2016-06-03 Stephanie Tzouanas Schmidt , Stephanie M. Zimmerman , Jianbin Wang , Stuart K. Kim , Stephen R. Quake

Reliable algorithm selection for machine learning-guided design

Algorithms for machine learning-guided design, or design algorithms, use machine learning-based predictions to propose novel objects with desired property values. Given a new design task -- for example, to design novel proteins with high…

Machine Learning · Computer Science 2025-07-04 Clara Fannjiang , Ji Won Park

A Three-Stage Algorithm for the Closest String Problem on Artificial and Real Gene Sequences

The Closest String Problem is an NP-hard problem that aims to find a string that has the minimum distance from all sequences that belong to the given set of strings. Its applications can be found in coding theory, computational biology, and…

Artificial Intelligence · Computer Science 2024-07-19 Alireza Abdi , Marko Djukanovic , Hesam Tahmasebi Boldaji , Hadis Salehi , Aleksandar Kartelj

Randomized Fast Design of Short DNA Words

We consider the problem of efficiently designing sets (codes) of equal-length DNA strings (words) that satisfy certain combinatorial constraints. This problem has numerous motivations including DNA computing and DNA self-assembly. Previous…

Data Structures and Algorithms · Computer Science 2007-05-23 Ming-Yang Kao , Manan Sanghi , Robert Schweller

Benchmark of structured machine learning methods for microbial identification from mass-spectrometry data

Microbial identification is a central issue in microbiology, in particular in the fields of infectious diseases diagnosis and industrial quality control. The concept of species is tightly linked to the concept of biological and clinical…

Machine Learning · Statistics 2015-06-25 Kévin Vervier , Pierre Mahé , Jean-Baptiste Veyrieras , Jean-Philippe Vert

Unique Reconstruction of Coded Strings from Multiset Substring Spectra

The problem of reconstructing strings from their substring spectra has a long history and in its most simple incarnation asks for determining under which conditions the spectrum uniquely determines the string. We study the problem of coded…

Information Theory · Computer Science 2019-04-24 Ryan Gabrys , Olgica Milenkovic