Related papers: Alignment-Free Sequence Analysis and Applications

Alignment-free sequence comparison using absent words

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free…

Data Structures and Algorithms · Computer Science 2018-06-08 Panagiotis Charalampopoulos , Maxime Crochemore , Gabriele Fici , Robert Mercas , Solon P. Pissis

Alignment-free comparison of next-generation sequencing data using compression-based distance measures

Enormous volumes of short reads data from next-generation sequencing (NGS) technologies have posed new challenges to the area of genomic sequence comparison. The multiple sequence alignment approach is hardly applicable to NGS data due to…

Genomics · Quantitative Biology 2020-03-25 Ngoc Hieu Tran , Xin Chen

Linear-Time Sequence Comparison Using Minimal Absent Words & Applications

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free…

Data Structures and Algorithms · Computer Science 2015-12-23 Maxime Crochemore , Gabriele Fici , Robert Mercaş , Solon P. Pissis

Inference of Markovian Properties of Molecular Sequences from NGS Data and Applications to Comparative Genomics

Next Generation Sequencing (NGS) technologies generate large amounts of short read data for many different organisms. The fact that NGS reads are generally short makes it challenging to assemble the reads and reconstruct the original genome…

Genomics · Quantitative Biology 2015-04-07 Jie Ren , Kai Song , Minghua Deng , Gesine Reinert , Charles H. Cannon , Fengzhu Sun

A Misclassification Network-Based Method for Comparative Genomic Analysis

Classifying genome sequences based on metadata has been an active area of research in comparative genomics for decades with many important applications across the life sciences. Established methods for classifying genomes can be broadly…

Genomics · Quantitative Biology 2025-01-17 Wan He , Tina Eliassi-Rad , Samuel V. Scarpino

Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed…

Genomics · Quantitative Biology 2015-01-21 Ivan Borozan , Stuart Watt , Vincent Ferretti

Technology dictates algorithms: Recent developments in read alignment

Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of…

Genomics · Quantitative Biology 2023-11-21 Mohammed Alser , Jeremy Rotman , Kodi Taraszka , Huwenbo Shi , Pelin Icer Baykal , Harry Taegyun Yang , Victor Xue , Sergey Knyazev , Benjamin D. Singer , Brunilda Balliu , David Koslicki , Pavel Skums , Alex Zelikovsky , Can Alkan , Onur Mutlu , Serghei Mangul

A statistical physics perspective on alignment-independent protein sequence comparison

Within bioinformatics, the textual alignment of amino acid sequences has long dominated the determination of similarity between proteins, with all that implies for shared structure, function and evolutionary descent. Despite the relative…

Quantitative Methods · Quantitative Biology 2016-02-10 Amit K Chattopadhyay , Diar Nasiev , Darren R Flower

Sequence Alignment Algorithm for Statistical Similarity Assessment

This paper presents a new approach to statistical similarity assessment based on sequence alignment. The algorithm performs mutual matching of two random sequences by successively searching for common elements and by applying sequence…

Signal Processing · Electrical Eng. & Systems 2021-06-09 Jakub Nikonowicz , Łukasz Matuszewski , Paweł Kubczak

The Parallelism Motifs of Genomic Data Analysis

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-08 Katherine Yelick , Aydin Buluc , Muaaz Awan , Ariful Azad , Benjamin Brock , Rob Egan , Saliya Ekanayake , Marquita Ellis , Evangelos Georganas , Giulia Guidi , Steven Hofmeyr , Oguz Selvitopi , Cristina Teodoropol , Leonid Oliker

A probabilistic analysis of shotgun sequencing for metagenomics

Genome sequencing is the basis for many modern biological and medicinal studies. With recent technological advances, metagenomics has become a problem of interest. This problem entails the analysis and reconstruction of multiple DNA…

Probability · Mathematics 2022-01-14 Marlee Herring

Evolution of biosequence search algorithms: a brief survey

The paper surveys the evolution of main algorithmic techniques to compare and search biological sequences. We highlight key algorithmic ideas emerged in response to several interconnected factors: shifts of biological analytical paradigm,…

Genomics · Quantitative Biology 2018-11-15 Gregory Kucherov

benchNGS : An approach to benchmark short reads alignment tools

In the last decade a number of algorithms and associated software have been developed to align next generation sequencing (NGS) reads with relevant reference genomes. The accuracy of these programs may vary significantly, especially when…

Genomics · Quantitative Biology 2015-04-28 Farzana Rahman , Mehedi Hassan , Alona Kryshchenko , Inna Dubchak , Tatiana V. Tatarinova , Nickolai Alexandrov

DBG2OLC: Efficient Assembly of Large Genomes Using Long Erroneous Reads of the Third Generation Sequencing Technologies

(An updated version of this manuscript has been accepted to Scientific Reports in 2016, please refer to http://www.nature.com/articles/srep31900) The highly anticipated transition from next generation sequencing (NGS) to third generation…

Genomics · Quantitative Biology 2016-09-06 Chengxi Ye , Chris Hill , Shigang Wu , Jue Ruan , Zhanshan , Ma

$gen$CNN: A Convolutional Architecture for Word Sequence Prediction

We propose a novel convolutional architecture, named $gen$CNN, for word sequence prediction. Different from previous work on neural network-based language modeling and generation (e.g., RNN or LSTM), we choose not to greedily summarize the…

Computation and Language · Computer Science 2015-04-27 Mingxuan Wang , Zhengdong Lu , Hang Li , Wenbin Jiang , Qun Liu

Exploring The Potential Of GANs In Biological Sequence Analysis

Biological sequence analysis is an essential step toward building a deeper understanding of the underlying functions, structures, and behaviors of the sequences. It can help in identifying the characteristics of the associated organisms,…

Machine Learning · Computer Science 2023-03-07 Taslim Murad , Sarwan Ali , Murray Patterson

Conformance Checking for Less: Efficient Conformance Checking for Long Event Sequences

Long event sequences (termed traces) and large data logs that originate from sensors and prediction models are becoming increasingly common in our data-rich world. In such scenarios, conformance checking-validating a data log against an…

Databases · Computer Science 2025-05-29 Eli Bogdanov , Izack Cohen , Avigdor Gal

Unaligned Sequence Similarity Search Using Deep Learning

Gene annotation has traditionally required direct comparison of DNA sequences between an unknown gene and a database of known ones using string comparison methods. However, these methods do not provide useful information when a gene does…

Machine Learning · Computer Science 2019-09-17 James K. Senter , Taylor M. Royalty , Andrew D. Steen , Amir Sadovnik

Large-scale Machine Learning for Metagenomics Sequence Classification

Metagenomics characterizes the taxonomic diversity of microbial communities by sequencing DNA directly from an environmental sample. One of the main challenges in metagenomics data analysis is the binning step, where each sequenced read is…

Quantitative Methods · Quantitative Biology 2015-05-27 Kévin Vervier , Pierre Mahé , Maud Tournoud , Jean-Baptiste Veyrieras , Jean-Philippe Vert

Anchor points for genome alignment based on Filtered Spaced Word Matches

Alignment of large genomic sequences is a fundamental task in computational genome analysis. Most methods for genomic alignment use high-scoring local alignments as {\em anchor points} to reduce the search space of the alignment procedure.…

Genomics · Quantitative Biology 2017-03-28 Chris-Andre Leimeister , Thomas Dencker , Burkhard Morgenstern