Related papers: Twelve years of SAMtools and BCFtools
Since the advent of next-generation sequencing in the early 2000s, the volume of bioinformatics software tools and databases has exploded and continues to grow rapidly. Documenting this evolution on a global and time-dependent scale is a…
With high-throughput biotechnologies generating unprecedented quantities of data, researchers are faced with the challenge of locating and comparing an exponentially growing number of programs and websites dedicated to computational…
Open source bioinformatics tools running under MS Windows are rare to find, and those running under Windows HPC cluster are almost non-existing. This is despite the fact that the Windows is the most popular operating system used among life…
Summary: With the rapid development of long-read sequencing technologies, the era of individual complete genomes is approaching. We have developed wgatools, a cross-platform, ultrafast toolkit that supports a range of whole genome alignment…
Over the past decade, the Python-based Simulations of Chemistry Framework (PySCF) has developed into a widely used open-source platform for electronic structure theory and quantum chemical method development. This article reviews the major…
Summary: HTSeq 2.0 provides a more extensive API including a new representation for sparse genomic data, enhancements in htseq-count to suit single cell omics, a new script for data using cell and molecular barcodes, improved documentation,…
Summary: Biospectrogam is an open-source software for the spectral analysis of DNA and protein sequences. The software can fetch (from NCBI server), import and manage biological data. One can analyze the data using Digital Signal Processing…
Computational models have great potential to accelerate bioscience, bioengineering, and medicine. However, it remains challenging to reproduce and reuse simulations, in part, because the numerous formats and methods for simulating various…
The reproducibility of computational pipelines is an expectation in biomedical science, particularly in critical domains like human health. In this context, reporting next generation genome sequencing methods used in precision medicine…
Motivation: Accurate detection of sequence similarity and homologous recombination are essential parts of many evolutionary analyses. Results: We have developed SimPlot++, an open-source multiplatform application implemented in Python,…
Any cutting-edge scientific research project requires a myriad of computational tools for data generation, management, analysis and visualization. Python is a flexible and extensible scientific programming platform that offered the perfect…
Developers gain productivity by reusing readily available Free and Open Source Software (FOSS) components. Such practices also bring some difficulties, such as managing licensing, components and related security. One approach to handle…
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage…
Just like the scientific data they generate, simulation workflows for research should be findable, accessible, interoperable, and reusable (FAIR). However, while significant progress has been made towards FAIR data, the majority of science…
Motivation: In this paper we present the latest release of EBIC, a next-generation biclustering algorithm for mining genetic data. The major contribution of this paper is adding support for big data, making it possible to efficiently run…
Motivation: Modern genomics laboratories generate massive volumes of sequencing data, often resulting in significant storage costs. Genomics storage consists of duplicate files, temporary processing files, and redundant intermediate data.…
PySEMTools is a Python-based library for post-processing simulation data produced with high-order hexahedral elements in the context of the spectral element method in computational fluid dynamics. It aims to minimize intermediate steps…
Programming is ubiquitous in applied biostatistics; adopting software engineering skills will help biostatisticians do a better job. To explain this, we start by highlighting key challenges for software development and application in…
Motivation: Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of disparate bioinformatics tools. Results: To provide an…
BCI algorithm development has long been hampered by two major issues: small sample sets and a lack of reproducibility. We offer a solution to both of these problems via a software suite that streamlines both the issues of finding and…