Related papers: AnySeq: A High Performance Sequence Alignment Libr…

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-17 André Müller , Bertil Schmidt , Richard Membarth , Roland Leißa , Sebastian Hack

The use of deep learning models in computational biology has increased massively in recent years, and it is expected to continue with the current advances in the fields such as Natural Language Processing. These models, although able to…

Machine Learning · Computer Science 2024-09-16 Alfred Ferrer Florensa , Jose Juan Almagro Armenteros , Henrik Nielsen , Frank Møller Aarestrup , Philip Thomas Lanken Conradsen Clausen

TrioSeq: A Novel Approach to Accelerate Triplet Sequence Alignment on GPUs

State-of-the-art multiple sequence alignment (MSA) algorithms are based on progressive approaches that rely on pairwise sequence alignment (PSA) to generate guide trees to align all sequences. Given an evidenced explosion in genomic data…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-28 Miguel Graça , Aleksandar Ilic

seqme: a Python library for evaluating biological sequence design

Recent advances in computational methods for designing biological sequences have sparked the development of metrics to evaluate these methods performance in terms of the fidelity of the designed sequences to a target distribution and their…

Machine Learning · Computer Science 2025-11-07 Rasmus Møller-Larsen , Adam Izdebski , Jan Olszewski , Pankhil Gawade , Michal Kmicikiewicz , Wojciech Zarzecki , Ewa Szczurek

PyamilySeq: A Python Tool for Interpretable Gene (Re)Clustering and Pangenomic Inference Across Species and Genera

PyamilySeq is a Python-based tool designed for interpretable gene clustering and pangenomic inference, supporting analyses at both species and genus levels. It facilitates the clustering of gene sequences into families based on sequence…

Genomics · Quantitative Biology 2024-07-30 Nicholas J. Dimonaco

fairseq: A Fast, Extensible Toolkit for Sequence Modeling

fairseq is an open-source sequence modeling toolkit that allows researchers and developers to train custom models for translation, summarization, language modeling, and other text generation tasks. The toolkit is based on PyTorch and…

Computation and Language · Computer Science 2019-04-03 Myle Ott , Sergey Edunov , Alexei Baevski , Angela Fan , Sam Gross , Nathan Ng , David Grangier , Michael Auli

Minimap2: pairwise alignment for nucleotide sequences

Motivation: Recent advances in sequencing technologies promise ultra-long reads of $\sim$100 kilo bases (kb) in average, full-length mRNA or cDNA reads in high throughput and genomic contigs over 100 mega bases (Mb) in length. Existing…

Genomics · Quantitative Biology 2018-09-17 Heng Li

Inseq: An Interpretability Toolkit for Sequence Generation Models

Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python…

Computation and Language · Computer Science 2023-09-08 Gabriele Sarti , Nils Feldhus , Ludwig Sickert , Oskar van der Wal , Malvina Nissim , Arianna Bisazza

Aligning biological sequences by exploiting residue conservation and coevolution

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e. arranging sequences from different organisms in…

Quantitative Methods · Quantitative Biology 2020-12-08 Anna Paola Muntoni , Andrea Pagnani , Martin Weigt , Francesco Zamponi

BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data

Genome sequences contain hundreds of millions of DNA base pairs. Finding the degree of similarity between two genomes requires executing a compute-intensive dynamic programming algorithm, such as Smith-Waterman. Traditional von Neumann…

Emerging Technologies · Computer Science 2019-01-21 Roman Kaplan , Leonid Yavits , Ran Ginosar

New Sequence Alignment Algorithm using AI Rules and Dynamic Seeds

DNA sequence alignment is important today as it is usually the first step in finding gene mutation, evolutionary similarities, protein structure, drug development and cancer treatment. Covid-19 is one recent example. There are many…

Genomics · Quantitative Biology 2023-06-01 Suchindra , Preetam Nagaraj

Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design

Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering. Challenges exist due to the complex sequence--fold relationship, as well as the difficulties to capture the…

Machine Learning · Computer Science 2021-06-25 Yue Cao , Payel Das , Vijil Chenthamarakshan , Pin-Yu Chen , Igor Melnyk , Yang Shen

Pairwise sequence alignment at arbitrarily large evolutionary distance

Ancestral sequence reconstruction is a key task in computational biology. It consists in inferring a molecular sequence at an ancestral species of a known phylogeny, given descendant sequences at the tip of the tree. In addition to its many…

Populations and Evolution · Quantitative Biology 2022-07-27 Brandon Legried , Sebastien Roch

Fast Approximate Inference of Transcript Expression Levels from RNA-seq Data

Motivation: The mapping of RNA-seq reads to their transcripts of origin is a fundamental task in transcript expression estimation and differential expression scoring. Where ambiguities in mapping exist due to transcripts sharing sequence,…

Genomics · Quantitative Biology 2015-01-28 James Hensman , Peter Glaus , Antti Honkela , Magnus Rattray

MyESL: Sparse learning in molecular evolution and phylogenetic analysis

Evolutionary sparse learning (ESL) uses a supervised machine learning approach, Least Absolute Shrinkage and Selection Operator (LASSO), to build models explaining the relationship between a hypothesis and the variation across genomic…

Populations and Evolution · Quantitative Biology 2025-01-10 Maxwell Sanderford , Sudip Sharma , Glen Stecher , Jun Liu , Jieping Ye , Sudhir Kumar

BioKlustering: a web app for semi-supervised learning of maximally imbalanced genomic data

Summary: Accurate phenotype prediction from genomic sequences is a highly coveted task in biological and medical research. While machine-learning holds the key to accurate prediction in a variety of fields, the complexity of biological data…

Genomics · Quantitative Biology 2024-12-17 Samuel Ozminkowski , Yuke Wu , Hailey Bruzzone , Liule Yang , Zhiwen Xu , Luke Selberg , Chunrong Huang , Helena Jaramillo-Mesa , Claudia Solis-Lemus

BOAssembler: a Bayesian Optimization Framework to Improve RNA-Seq Assembly Performance

High throughput sequencing of RNA (RNA-Seq) can provide us with millions of short fragments of RNA transcripts from a sample. How to better recover the original RNA transcripts from those fragments (RNA-Seq assembly) is still a difficult…

Genomics · Quantitative Biology 2019-02-15 Shunfu Mao , Yihan Jiang , Edwin Basil Mathew , Sreeram Kannan

SECLAF: A Webserver and Deep Neural Network Design Tool for Biological Sequence Classification

Artificial intelligence (AI) tools are gaining more and more ground each year in bioinformatics. Learning algorithms can be taught easily by using the existing enormous biological databases, and the resulting models can be used for the…

Biomolecules · Quantitative Biology 2017-08-15 Balazs Szalkai , Vince Grolmusz

Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design

Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, and forensics. However, the analysis of genome sequencing data is currently bottlenecked by the…

Hardware Architecture · Computer Science 2021-11-04 Damla Senol Cali

SequenceLab: A Comprehensive Benchmark of Computational Methods for Comparing Genomic Sequences

Computational complexity is a key limitation of genomic analyses. Thus, over the last 30 years, researchers have proposed numerous fast heuristic methods that provide computational relief. Comparing genomic sequences is one of the most…

Genomics · Quantitative Biology 2025-01-24 Maximilian-David Rumpf , Mohammed Alser , Arvid E. Gollwitzer , Joel Lindegger , Nour Almadhoun , Can Firtina , Serghei Mangul , Onur Mutlu