Related papers: LOGAN: High-Performance GPU-Based X-Drop Long-Read…

Space Efficient Sequence Alignment for SRAM-Based Computing: X-Drop on the Graphcore IPU

Dedicated accelerator hardware has become essential for processing AI-based workloads, leading to the rise of novel accelerator architectures. Furthermore, fundamental differences in memory architecture and parallelism have made these…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-19 Luk Burchard , Max Xiaohang Zhao , Johannes Langguth , Aydın Buluç , Giulia Guidi

AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read Mapping

With the advance in genome sequencing technology, the lengths of deoxyribonucleic acid (DNA) sequencing results are rapidly increasing at lower prices than ever. However, the longer lengths come at the cost of a heavy computational burden…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-12 Seongyeon Park , Junguk Hong , Jaeyong Song , Hajin Kim , Youngsok Kim , Jinho Lee

GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping

Genome sequencing has become a central focus in computational biology. A genome study typically begins with sequencing, which produces millions to billions of short DNA fragments known as reads. Read mapping aligns these reads to a…

Hardware Architecture · Computer Science 2026-01-28 Julien Eudine , Chu Li , Zhuo Cheng , Renzo Andri , Can Firtina , Mohammad Sadrosadati , Nika Mansouri Ghiasi , Konstantina Koliogeorgi , Anirban Nag , Arash Tavakkol , Haiyu Mao , Onur Mutlu , Shai Bergman , Ji Zhang

diBELLA: Distributed Long Read to Long Read Alignment

We present a parallel algorithm and scalable implementation for genome analysis, specifically the problem of finding overlaps and alignments for data from "third generation" long read sequencers. While long sequences of DNA offer enormous…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-29 Marquita Ellis , Giulia Guidi , Aydın Buluç , Leonid Oliker , Katherine Yelick

AnySeq/GPU: A Novel Approach for Faster Sequence Alignment on GPUs

In recent years, the rapidly increasing number of reads produced by next-generation sequencing (NGS) technologies has driven the demand for efficient implementations of sequence alignments in bioinformatics. However, current…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-05-17 André Müller , Bertil Schmidt , Richard Membarth , Roland Leißa , Sebastian Hack

Rapid GPU-Based Pangenome Graph Layout

Computational Pangenomics is an emerging field that studies genetic variation using a graph structure encompassing multiple genomes. Visualizing pangenome graphs is vital for understanding genome diversity. Yet, handling large graphs can be…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Jiajie Li , Jan-Niklas Schmelzle , Yixiao Du , Simon Heumos , Andrea Guarracino , Giulia Guidi , Pjotr Prins , Erik Garrison , Zhiru Zhang

GIGA-Lens: Fast Bayesian Inference for Strong Gravitational Lens Modeling

We present GIGA-Lens: a gradient-informed, GPU-accelerated Bayesian framework for modeling strong gravitational lensing systems, implemented in TensorFlow and JAX. The three components, optimization using multi-start gradient descent,…

Instrumentation and Methods for Astrophysics · Physics 2022-08-16 A. Gu , X. Huang , W. Sheu , G. Aldering , A. S. Bolton , K. Boone , A. Dey , A. Filipp , E. Jullo , S. Perlmutter , D. Rubin , E. F. Schlafly , D. J. Schlegel , Y. Shu , S. H. Suyu

FPGA Acceleration of Short Read Alignment

Aligning millions of short DNA or RNA reads, of 75 to 250 base pairs each, to a reference genome is a significant computation problem in bioinformatics. We present a flexible and fast FPGA-based short read alignment tool. Our aligner makes…

Genomics · Quantitative Biology 2018-05-02 Nathaniel McVicar , Akina Hoshino , Anna La Torre , Thomas A. Reh , Walter L. Ruzzo , Scott Hauck

High Performance Computing Applied to Logistic Regression: A CPU and GPU Implementation Comparison

We present a versatile GPU-based parallel version of Logistic Regression (LR), aiming to address the increasing demand for faster algorithms in binary classification due to large data sets. Our implementation is a direct translation of the…

Machine Learning · Computer Science 2023-08-22 Nechba Mohammed , Mouhajir Mohamed , Sedjari Yassine

Efficient Kernel Mapping and Comprehensive System Evaluation of LLM Acceleration on a CGLA

Large Language Models (LLMs) demand substantial computational resources, resulting in high energy consumption on GPUs. To address this challenge, we focus on Coarse-Grained Reconfigurable Arrays (CGRAs) as an effective alternative that…

Hardware Architecture · Computer Science 2025-12-02 Takuto Ando , Yu Eto , Ayumu Takeuchi , Yasuhiko Nakashima

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning

In the last few years, the memory requirements to train state-of-the-art neural networks have far exceeded the DRAM capacities of modern hardware accelerators. This has necessitated the development of efficient algorithms to train these…

Machine Learning · Computer Science 2023-05-16 Siddharth Singh , Abhinav Bhatele

LGAN: An Efficient High-Order Graph Neural Network via the Line Graph Aggregation

Graph Neural Networks (GNNs) have emerged as a dominant paradigm for graph classification. Specifically, most existing GNNs mainly rely on the message passing strategy between neighbor nodes, where the expressivity is limited by the…

Machine Learning · Computer Science 2025-12-12 Lin Du , Lu Bai , Jincheng Li , Lixin Cui , Hangyuan Du , Lichi Zhang , Yuting Chen , Zhao Li

GPU-Accelerated Primal Learning for Extremely Fast Large-Scale Classification

One of the most efficient methods to solve L2-regularized primal problems, such as logistic regression and linear support vector machine (SVM) classification, is the widely used trust region Newton algorithm, TRON. While TRON has recently…

Machine Learning · Computer Science 2020-10-16 John T. Halloran , David M. Rocke

RAPIDx: High-performance ReRAM Processing in-Memory Accelerator for Sequence Alignment

Genome sequence alignment is the core of many biological applications. The advancement of sequencing technologies produces a tremendous amount of data, making sequence alignment a critical bottleneck in bioinformatics analysis. The existing…

Hardware Architecture · Computer Science 2023-01-26 Weihong Xu , Saransh Gupta , Niema Moshiri , Tajana Rosing

ParaGAN: A Scalable Distributed Training Framework for Generative Adversarial Networks

Recent advances in Generative Artificial Intelligence have fueled numerous applications, particularly those involving Generative Adversarial Networks (GANs), which are essential for synthesizing realistic photos and videos. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-11-07 Ziji Shi , Jialin Li , Yang You

ReGAN: RE[LAX|BAR|INFORCE] based Sequence Generation using GANs

Generative Adversarial Networks (GANs) have seen steep ascension to the peak of ML research zeitgeist in recent years. Mostly catalyzed by its success in the domain of image generation, the technique has seen wide range of adoption in a…

Machine Learning · Statistics 2018-05-09 Aparna Balagopalan , Satya Gorti , Mathieu Ravaut , Raeid Saqur

GLEAN: Generative Latent Bank for Image Super-Resolution and Beyond

We show that pre-trained Generative Adversarial Networks (GANs) such as StyleGAN and BigGAN can be used as a latent bank to improve the performance of image super-resolution. While most existing perceptual-oriented approaches attempt to…

Computer Vision and Pattern Recognition · Computer Science 2022-08-01 Kelvin C. K. Chan , Xiangyu Xu , Xintao Wang , Jinwei Gu , Chen Change Loy

Democratizing AI: A Comparative Study in Deep Learning Efficiency and Future Trends in Computational Processing

The exponential growth in data has intensified the demand for computational power to train large-scale deep learning models. However, the rapid growth in model size and complexity raises concerns about equal and fair access to computational…

Performance · Computer Science 2026-04-03 Lisan Al Amin , Md Ismail Hossain , Rupak Kumar Das , Mahbubul Islam , Abdulaziz Tabbakh

Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design

Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, and forensics. However, the analysis of genome sequencing data is currently bottlenecked by the…

Hardware Architecture · Computer Science 2021-11-04 Damla Senol Cali

LeapAlign: Post-Training Flow Matching Models at Any Generation Step by Building Two-Step Trajectories

This paper focuses on the alignment of flow matching models with human preferences. A promising way is fine-tuning by directly backpropagating reward gradients through the differentiable generation process of flow matching. However,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-05 Zhanhao Liang , Tao Yang , Jie Wu , Chengjian Feng , Liang Zheng