Related papers: wgatools: an ultrafast toolkit for manipulating wh…
Motivation: The rapid growth in genome-wide association studies (GWAS) in plants and animals has brought about the need for a central resource that facilitates i) performing GWAS, ii) accessing data and results of other GWAS, and iii)…
Background: While the importance of gene-gene interactions in human diseases has been well recognized, identifying them has been a great challenge, especially through association studies with millions of genetic markers and thousands of…
The AGP format is a tab-separated table format describing how components of a genome assembly fit together. A standard submission format for genome assemblies is a fasta file giving the sequence of contigs along with an AGP file showing how…
The effective visualization of genomic data is crucial for exploring and interpreting complex relationships within and across genes and genomes. Despite advances in developing dedicated bioinformatics software, common visualization tools…
Motivation: A pan-genome graph represents a collection of genomes and encodes sequence variations between them. It is a powerful data structure for studying multiple similar genomes. Sequence-to-graph alignment is an essential step for the…
In recent years, Whole Genome Sequencing (WGS) evolved from a futuristic-sounding research project to an increasingly affordable technology for determining complete genome sequences of complex organisms, including humans. This prompts a…
Metabolomic data sets provide a direct read-out of cellular phenotypes and are increasingly generated to study biological questions. Our previous work revealed the potential of analyzing extracellular metabolomic data in the context of the…
Next-generation sequencing (NGS) is a pivotal technique in genome sequencing due to its high throughput, rapid results, cost-effectiveness, and enhanced accuracy. Its significance extends across various domains, playing a crucial role in…
The generation of high-quality assemblies, even for large eukaryotic genomes, has become a routine task for many biologists thanks to recent advances in sequencing technologies. However, the annotation of these assemblies - a crucial step…
A genome read data set can be quickly and efficiently remapped from one reference to another similar reference (e.g., between two reference versions or two similar species) using a variety of tools, e.g., the commonly-used CrossMap tool.…
How to compare whole genome sequences at large scale has not been achieved via conventional methods based on pair-wisely base-to-base comparison; nevertheless, no attention was paid to handle in-one-sitting a number of genomes crossing…
Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, and forensics. However, the analysis of genome sequencing data is currently bottlenecked by the…
Summary: BWA-MEM is a new alignment algorithm for aligning sequence reads or long query sequences against a large reference genome such as human. It automatically chooses between local and end-to-end alignments, supports paired-end reads…
Motivation: The multiple sequence alignment (MSA) problem has been extensively studied, with numerous approaches developed over recent years. With the rapid growth of sequence data, there is an increasing need for fast and accurate MSA…
Motivation: Protein-to-genome alignment is critical to annotating genes in non-model organisms. While there are a few tools for this purpose, all of them were developed over ten years ago and did not incorporate the latest advances in…
Tool learning is increasingly important for large language models (LLMs) to effectively coordinate and utilize a diverse set of tools in order to solve complex real-world tasks. By selecting and integrating appropriate tools, LLMs extend…
Aligning millions of short DNA or RNA reads, of 75 to 250 base pairs each, to a reference genome is a significant computation problem in bioinformatics. We present a flexible and fast FPGA-based short read alignment tool. Our aligner makes…
We now need more than ever to make genome analysis more intelligent. We need to read, analyze, and interpret our genomes not only quickly, but also accurately and efficiently enough to scale the analysis to population level. There currently…
Massively parallel sequencing techniques have revolutionized biological and medical sciences by providing unprecedented insight into the genomes of humans, animals, and microbes. Modern sequencing platforms generate enormous amounts of…
To uncover the genetic basis of complex disease, individuals are often measured at a large number of genetic variants (usually SNPs) across the genome. GemTools provides computationally efficient tools for modeling genetic ancestry based on…