Related papers: seqme: a Python library for evaluating biological …

Inseq: An Interpretability Toolkit for Sequence Generation Models

Past work in natural language processing interpretability focused mainly on popular classification tasks while largely overlooking generation settings, partly due to a lack of dedicated tools. In this work, we introduce Inseq, a Python…

Computation and Language · Computer Science 2023-09-08 Gabriele Sarti , Nils Feldhus , Ludwig Sickert , Oskar van der Wal , Malvina Nissim , Arianna Bisazza

The use of deep learning models in computational biology has increased massively in recent years, and it is expected to continue with the current advances in the fields such as Natural Language Processing. These models, although able to…

Machine Learning · Computer Science 2024-09-16 Alfred Ferrer Florensa , Jose Juan Almagro Armenteros , Henrik Nielsen , Frank Møller Aarestrup , Philip Thomas Lanken Conradsen Clausen

PDBench: Evaluating Computational Methods for Protein Sequence Design

Proteins perform critical processes in all living systems: converting solar energy into chemical energy, replicating DNA, as the basis of highly performant materials, sensing and much more. While an incredible range of functionality has…

Biomolecules · Quantitative Biology 2021-09-29 Leonardo V. Castorina , Rokas Petrenas , Kartic Subr , Christopher W. Wood

Biological Sequence Kernels with Guaranteed Flexibility

Applying machine learning to biological sequences - DNA, RNA and protein - has enormous potential to advance human health, environmental sustainability, and fundamental biological understanding. However, many existing machine learning…

Machine Learning · Statistics 2023-04-11 Alan Nawzad Amin , Eli Nathan Weinstein , Debora Susan Marks

SBSM-Pro: Support Bio-sequence Machine for Proteins

Proteins play a pivotal role in biological systems. The use of machine learning algorithms for protein classification can assist and even guide biological experiments, offering crucial insights for biotechnological applications. We…

Quantitative Methods · Quantitative Biology 2024-10-24 Yizheng Wang , Yixiao Zhai , Yijie Ding , Quan Zou

AnySeq: A High Performance Sequence Alignment Library based on Partial Evaluation

Sequence alignments are fundamental to bioinformatics which has resulted in a variety of optimized implementations. Unfortunately, the vast majority of them are hand-tuned and specific to certain architectures and execution models. This not…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-17 André Müller , Bertil Schmidt , Andreas Hildebrandt , Richard Membarth , Roland Leißa , Matthis Kruse , Sebastian Hack

Forecasting labels under distribution-shift for machine-guided sequence design

The ability to design and optimize biological sequences with specific functionalities would unlock enormous value in technology and healthcare. In recent years, machine learning-guided sequence design has progressed this goal significantly,…

Quantitative Methods · Quantitative Biology 2022-11-21 Lauren Berk Wheelock , Stephen Malina , Jeffrey Gerold , Sam Sinai

Distributed Representations for Biological Sequence Analysis

Biological sequence comparison is a key step in inferring the relatedness of various organisms and the functional similarity of their components. Thanks to the Next Generation Sequencing efforts, an abundance of sequence data is now…

Machine Learning · Computer Science 2016-09-13 Dhananjay Kimothi , Akshay Soni , Pravesh Biyani , James M. Hogan

A Python library for efficient computation of molecular fingerprints

Machine learning solutions are very popular in the field of chemoinformatics, where they have numerous applications, such as novel drug discovery or molecular property prediction. Molecular fingerprints are algorithms commonly used for…

Quantitative Methods · Quantitative Biology 2024-04-01 Michał Szafarczyk , Piotr Ludynia , Przemysław Kukla

A primer on model-guided exploration of fitness landscapes for biological sequence design

Machine learning methods are increasingly employed to address challenges faced by biologists. One area that will greatly benefit from this cross-pollination is the problem of biological sequence design, which has massive potential for…

Quantitative Methods · Quantitative Biology 2020-10-26 Sam Sinai , Eric D Kelsic

SnakeLines: integrated set of computational pipelines for sequencing reads

Background: With the rapid growth of massively parallel sequencing technologies, still more laboratories are utilizing sequenced DNA fragments for genomic analyses. Interpretation of sequencing data is, however, strongly dependent on…

Genomics · Quantitative Biology 2021-06-28 Jaroslav Budis , Werner Krampl , Marcel Kucharik , Rastislav Hekel , Adrian Goga , Michal Lichvar , David Smolak , Miroslav Bohmer , Andrej Balaz , Frantisek Duris , Juraj Gazdarica , Katarina Soltys , Jan Turna , Jan Radvanszky , Tomas Szemes

Integrating alignment-based and alignment-free sequence similarity measures for biological sequence classification

Alignment-based sequence similarity searches, while accurate for some type of sequences, can produce incorrect results when used on more divergent but functionally related sequences that have undergone the sequence rearrangements observed…

Genomics · Quantitative Biology 2015-01-21 Ivan Borozan , Stuart Watt , Vincent Ferretti

MISeval: a Metric Library for Medical Image Segmentation Evaluation

Correct performance assessment is crucial for evaluating modern artificial intelligence algorithms in medicine like deep-learning based medical image segmentation models. However, there is no universal metric library in Python for…

Computer Vision and Pattern Recognition · Computer Science 2022-01-25 Dominik Müller , Dennis Hartmann , Philip Meyer , Florian Auer , Iñaki Soto-Rey , Frank Kramer

Align-gram : Rethinking the Skip-gram Model for Protein Sequence Analysis

Background: The inception of next generations sequencing technologies have exponentially increased the volume of biological sequence data. Protein sequences, being quoted as the `language of life', has been analyzed for a multitude of…

Quantitative Methods · Quantitative Biology 2020-12-08 Nabil Ibtehaz , S. M. Shakhawat Hossain Sourav , Md. Shamsuzzoha Bayzid , M. Sohel Rahman

SparseChem: Fast and accurate machine learning model for small molecules

SparseChem provides fast and accurate machine learning models for biochemical applications. Especially, the package supports very high-dimensional sparse inputs, e.g., millions of features and millions of compounds. It is possible to train…

Machine Learning · Statistics 2022-03-10 Adam Arany , Jaak Simm , Martijn Oldenhof , Yves Moreau

Finite Width Model Sequence Comparison

Sequence comparison is a widely used computational technique in modern molecular biology. In spite of the frequent use of sequence comparisons the important problem of assigning statistical significance to a given degree of similarity is…

Quantitative Methods · Quantitative Biology 2007-05-23 Ralf Bundschuh , Nicholas Chia

Sequencing by Emergence: Modeling and Estimation

Sequencing by Emergence (SEQE) is a new single-molecule nucleic acid (DNA/RNA) sequencing technology that estimates sequence as an emergent property of the binding and localization of a repertoire of short oligonucleotide probes. SEQE…

Genomics · Quantitative Biology 2021-08-04 Nicholas Boyd , Samuel Woodhouse , Kalim Mir

SnipGen: A Mining Repository Framework for Evaluating LLMs for Code

Language Models (LLMs), such as transformer-based neural networks trained on billions of parameters, have become increasingly prevalent in software engineering (SE). These models, trained on extensive datasets that include code…

Software Engineering · Computer Science 2025-02-18 Daniel Rodriguez-Cardenas , Alejandro Velasco , Denys Poshyvanyk

Scikit-fingerprints: easy and efficient computation of molecular fingerprints in Python

In this work, we present scikit-fingerprints, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy…

Software Engineering · Computer Science 2025-08-12 Jakub Adamczyk , Piotr Ludynia

BSM: Small but Powerful Biological Sequence Model for Genes and Proteins

Modeling biological sequences such as DNA, RNA, and proteins is crucial for understanding complex processes like gene regulation and protein synthesis. However, most current models either focus on a single type or treat multiple types of…

Genomics · Quantitative Biology 2024-10-16 Weixi Xiang , Xueting Han , Xiujuan Chai , Jing Bai