Genomics — Scifaro

PACE: Geometry-Aware Bridge Transport for Single-Cell Trajectory Inference

Single-cell trajectory inference from destructive time-course snapshots is fundamentally ill-posed: neither cross-time cell correspondences nor continuous trajectories are observed, so the snapshot distributions alone do not uniquely…

Genomics · Quantitative Biology 2026-05-29 Chenglei Yu , Chuanrui Wang , Bangyan Liao , Tailin Wu

SCUDDO: An unsupervised clustering algorithm for single-cell Hi-C maps using diagonal diffusion operators

Motivation: Advances in high-throughput chromatin conformation capture have provided insight into the three-dimensional structure and organization of chromatin. While bulk Hi-C experiments capture spatio-temporally averaged chromatin…

Genomics · Quantitative Biology 2026-05-28 Luka Maisuradze , Corey S. O'Hern , Mark D. Shattuck

C3P: Contrastive promoter-protein pretraining yields representations capturing bacterial gene regulation

Despite the increasing scale of genome language models (gLMs), their ability to decode the function of regulatory sequences remains unclear. gLM pretraining relies on sequence reconstruction, which may struggle due to the noisy, rapidly…

Genomics · Quantitative Biology 2026-05-26 Cameron Dufault , Scott Xu , Alan M. Moses

AnnotateMissense: a genome-wide annotation and benchmarking framework for missense pathogenicity prediction

Missense variant interpretation remains challenging because pathogenicity depends on heterogeneous evidence from population frequency, evolutionary conservation, transcript context, amino acid substitution severity, prior pathogenicity…

Genomics · Quantitative Biology 2026-05-26 Muhammad Muneeb , David B. Ascher

WTKO-CNN: Deep Learning Reveals Sequence Motifs Distinguishing Wild-Type and Knockout ATAC-seq Peaks

Chromatin regulators can alter transcriptional programs by modifying the accessibility of regulatory DNA elements. Understanding how regulatory sequences differ between wild-type (WT) and knockout (KO) conditions is crucial for deciphering…

Genomics · Quantitative Biology 2026-05-26 Lopamudra Dey

Multi-Modal Machine Learning for Population- and Subject-Specific lncRNA-Type 2 Diabetes Association Analysis

Long non-coding RNAs (lncRNAs) are emerging regulatory molecules implicated in chronic disease pathogenesis, including Type 2 Diabetes Mellitus (T2D). We investigated ten literature reported lncRNAs associated with T2D: MALAT1, MEG3, MIAT,…

Genomics · Quantitative Biology 2026-05-26 Ashwani Siwach , Sanjeev Narayan Sharma , Sunil Datt Sharma

Population-Specific Genetic and Non-Genetic Influences on Sleep Traits and Health Outcomes

Sleep traits are shaped by genetic and environmental factors and may influence many health conditions. The All of Us Research Program, which includes EHR, physical measurements, genomic data, and wearable data across ancestry groups,…

Genomics · Quantitative Biology 2026-05-25 Jiheum Park , Stephanie Y. Shue , Rocio Barragan , Jeong Yun Yang , Tian Gu , Chin Hur , Marie-Pierre St-Onge

Detecting and Correcting Sample-by-Sample Scale Distortion in RNA Sequencing Data

RNA sequencing (RNA-seq) is the conventional genome-scale approach used to capture the expression levels of all detectable genes in a biological sample. This is now regularly used for population-based studies designed to identify genetic…

Genomics · Quantitative Biology 2026-05-25 Christopher Thron , Farhad Jafari

SciHorizon-GENE: Benchmarking LLM for Life Sciences Inference from Gene Knowledge to Functional Understanding

Large language models (LLMs) have shown growing promise in biomedical research, particularly for knowledge-driven interpretation tasks. However, their ability to reliably reason from gene-level knowledge to functional understanding, a core…

Genomics · Quantitative Biology 2026-05-25 Xiaohan Huang , Meng Xiao , Chuan Qin , Qingqing Long , Jinmiao Chen , Yuanchun Zhou , Hengshu Zhu

bioETH-PRS: Confidential Polygenic Risk Scoring without a Trusted Evaluator via Fully Homomorphic Encryption on a Programmable Blockchain

Polygenic risk scores (PRSs) aggregate genetic effect estimates to predict disease susceptibility, yet clinical deployment often exposes raw genotype data to third-party compute infrastructure. Prior homomorphic-encryption approaches, still…

Genomics · Quantitative Biology 2026-05-22 Kimon Antonios Provatas , Christos Galanopoulos , Ilias Georgakopoulos-Soares

CRANE: Correcting Errors in Raw Nanopore Signals Using Hidden Markov Models

Nanopore sequencing can read substantially longer sequences of nucleic acid molecules, called reads, than other sequencing methods, which has led to advances in genomic analysis such as the gapless human genome assembly. By analyzing the…

Genomics · Quantitative Biology 2026-05-21 Simon Ambrozak , Ulysse McConnell , Bhargav Srinivasan , Burak Ozkan , Ernest Zhang , Can Firtina

DNACHUNKER: Learnable Tokenization for DNA Language Models

DNA language models are increasingly used to represent genomic sequence, yet their effectiveness depends critically on how raw nucleotides are converted into model inputs. Unlike natural language, DNA offers no canonical boundaries, making…

Genomics · Quantitative Biology 2026-05-21 Taewon Kim , Jihwan Shin , Hyomin Kim , Youngmok Jung , Jonghoon Lee , Won-Chul Lee , Sungsoo Ahn , Insu Han

Informational blueprints reveal condition-dependent gene regulatory architectures

While coding regions in the genome have a direct interpretation in terms of protein products, significant fractions are non-coding and yet control essential biological functions. Unlike the genetic code, there is no "lookup table" that…

Genomics · Quantitative Biology 2026-05-20 Doruk Efe Gökmen , Rosalind Wenshan Pan , Tom Röschinger , Stephen Quake , Hernan Garcia , Rob Phillips , Vincenzo Vitelli

Tool Choice Matters: Evaluating edgeR vs. DESeq2 for Sensitivity, Robustness, and Cross-Study Performance

Differential gene expression (DGE) analysis is foundational to transcriptomic research, yet tool selection can substantially influence results. This study presents a comprehensive comparison of two widely used DGE tools, edgeR and DESeq2,…

Genomics · Quantitative Biology 2026-05-20 Mostafa Rezapour

StateXDiff: Cell State-Contextualized Multimodal Diffusion for Single-Cell Perturbation Prediction

Predicting drug-induced cellular state changes at single-cell resolution remains a central challenge in virtual cell modeling, particularly under out-of-distribution (OOD) conditions. Current approaches predominantly rely on RNA-based…

Genomics · Quantitative Biology 2026-05-18 Peiting Shi , Ningfeng Que , Xianzhe Huang , Xiaofei Wang , Jianzhong Jeff Xi

Genome-Factory: A Library for Tuning, Deploying, and Interpreting Genomic Foundation Models

We introduce Genome-Factory, the first integrated Python library for tuning, deploying, and interpreting genomic foundation models. Our core contribution is to simplify and unify the workflow for genomic model development: data collection,…

Genomics · Quantitative Biology 2026-05-18 Weimin Wu , Xuefeng Song , Yibo Wen , Qinjie Lin , Zhihan Zhou , Jerry Yao-Chieh Hu , Zhong Wang , Han Liu

Fast Iteration of Spaced k-mers

Background: Short sequence substrings of a fixed length k, called k-mers, are a ubiquitous computational primitive in bioinformatics, used across sequence indexing, read mapping, genome assembly, metagenomic classification, and comparative…

Genomics · Quantitative Biology 2026-05-15 Lucas Czech

Set-Aggregated Genome Embeddings for Microbiome Abundance Prediction

Microbiome functions are encoded within the genes of the community-wide metagenome. A natural question is whether properties of a microbial community can be predicted just from knowing the raw DNA sequences of its members. In this work, we…

Genomics · Quantitative Biology 2026-05-13 Younhun Kim , Georg K. Gerber , Travis E. Gibson

SCOPE: Siamese Contrastive Operon Pair Embeddings for Functional Sequence Representation and Classification

Identifying operons is a fundamental step in understanding prokaryotic gene regulation, as classifying genes into operons supports the reconstruction of regulatory networks, functional annotation of unannotated genes, and drug candidate…

Genomics · Quantitative Biology 2026-05-13 Akarsh Gupta , Kenneth Rodrigues , Sagnik Chatterjee

GeneZip: Region-Aware Compression for Long Context DNA Modeling

Long-context DNA models are limited by token-mixing cost and by how compression allocates representational budget across the genome. Existing approaches operate close to base-pair resolution, apply fixed downsampling, or learn…

Genomics · Quantitative Biology 2026-05-13 Jianan Zhao , Xixian Liu , Zhihao Zhan , Xinyu Yuan , Hongyu Guo , Jian Tang