Related papers: Binary Interval Search (BITS): A Scalable Algorith…

Bayesian iterative screening in ultra-high dimensional linear regressions

Variable selection in ultra-high dimensional linear regression is often preceded by a screening step to significantly reduce the dimension. Here we develop a Bayesian variable screening method (BITS) guided by the posterior model…

Methodology · Statistics 2025-02-28 Run Wang , Somak Dutta , Vivekananda Roy

Computationally Efficient Whole-Genome Signal Region Detection for Quantitative and Binary Traits

The identification of genetic signal regions in the human genome is critical for understanding the genetic architecture of complex traits and diseases. Numerous methods based on scan algorithms (i.e. QSCAN, SCANG, SCANG-STARR) have been…

Applications · Statistics 2025-01-24 Wei Zhang , Fan Wang , Fang Yao

iSeg: an algorithm for segmentation of genomic data

Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments differing from adjacent segments. In many applications, the mean of the measured values at multiple genomic…

Applications · Statistics 2015-06-30 S. B. Girimurugan , Jonathan Dennis , Jinfeng Zhang

COBS: a Compact Bit-Sliced Signature Index

We present COBS, a COmpact Bit-sliced Signature index, which is a cross-over between an inverted index and Bloom filters. Our target application is to index $k$-mers of DNA samples or $q$-grams from text documents and process approximate…

Databases · Computer Science 2019-07-29 Timo Bingmann , Phelim Bradley , Florian Gauger , Zamin Iqbal

BITS-Tree-An Efficient Data Structure for Segment Storage and Query Processing

In this paper, a new and novel data structure is proposed to dynamically insert and delete segments. Unlike the standard segment trees[3], the proposed data structure permits insertion of a segment with interval range beyond the interval…

Computational Geometry · Computer Science 2015-01-15 K. S. Easwarakumar , T. Hema

Seeded Binary Segmentation: A general methodology for fast and optimal change point detection

In recent years, there has been an increasing demand on efficient algorithms for large scale change point detection problems. To this end, we propose seeded binary segmentation, an approach relying on a deterministic construction of…

Methodology · Statistics 2023-03-13 Solt Kovács , Housen Li , Peter Bühlmann , Axel Munk

A Comparative Study on String Matching Algorithm of Biological Sequences

String matching algorithm plays the vital role in the Computational Biology. The functional and structural relationship of the biological sequence is determined by similarities on that sequence. For that, the researcher is supposed to aware…

Data Structures and Algorithms · Computer Science 2014-01-30 Pandiselvam. P , Marimuthu. T , Lawrance. R

Fast and Scalable Gene Embedding Search: A Comparative Study of FAISS and ScaNN

The exponential growth of DNA sequencing data has outpaced traditional heuristic-based methods, which struggle to scale effectively. Efficient computational approaches are urgently needed to support large-scale similarity search, a…

Genomics · Quantitative Biology 2025-07-24 Mohammad Saleh Refahi , Gavin Hearne , Harrison Muller , Kieran Lynch , Bahrad A. Sokhansanj , James R. Brown , Gail Rosen

Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons

The Jaccard similarity index is an important measure of the overlap of two sets, widely used in machine learning, computational genomics, information retrieval, and many other areas. We design and implement SimilarityAtScale, the first…

Computational Engineering, Finance, and Science · Computer Science 2020-11-12 Maciej Besta , Raghavendra Kanakagiri , Harun Mustafa , Mikhail Karasikov , Gunnar Rätsch , Torsten Hoefler , Edgar Solomonik

A biological sequence comparison algorithm using quantum computers

Genetic information is encoded in a linear sequence of nucleotides, represented by letters ranging from thousands to billions. Mutations refer to changes in the DNA or RNA nucleotide sequence. Thus, mutation detection is vital in all areas…

Quantum Physics · Physics 2024-03-14 Büsra Kösoglu-Kind , Robert Loredo , Michele Grossi , Christian Bernecker , Jody M Burks , Rudiger Buchkremer

Large-scale inference of correlation among mixed-type biological traits with phylogenetic multivariate probit models

Inferring concerted changes among biological traits along an evolutionary history remains an important yet challenging problem. Besides adjusting for spurious correlation induced from the shared history, the task also requires sufficient…

Methodology · Statistics 2020-09-25 Zhenyu Zhang , Akihiko Nishimura , Paul Bastide , Xiang Ji , Rebecca P. Payne , Philip Goulder , Philippe Lemey , Marc A. Suchard

A Learned Index for Exact Similarity Search in Metric Spaces

Indexing is an effective way to support efficient query processing in large databases. Recently the concept of learned index, which replaces or complements traditional index structures with machine learning models, has been actively…

Databases · Computer Science 2022-08-01 Yao Tian , Tingyun Yan , Xi Zhao , Kai Huang , Xiaofang Zhou

Analyzing Large Biological Datasets with an Improved Algorithm for MIC

A computational framework utilizes the traditional similarity measures for mining the significant relationships in biological annotations is recently proposed by Tatiana V. Karpinets et al. [2]. In this paper, an improved approximation…

Databases · Computer Science 2015-07-21 Shuliang Wang , Yiping Zhao

Constraint-based Causal Discovery from Multiple Interventions over Overlapping Variable Sets

Scientific practice typically involves repeatedly studying a system, each time trying to unravel a different perspective. In each study, the scientist may take measurements under different experimental conditions (interventions,…

Machine Learning · Statistics 2014-03-11 Sofia Triantafillou , Ioannis Tsamardinos

BINAS: Bilinear Interpretable Neural Architecture Search

Practical use of neural networks often involves requirements on latency, energy and memory among others. A popular approach to find networks under such requirements is through constrained Neural Architecture Search (NAS). However, previous…

Machine Learning · Computer Science 2022-04-28 Niv Nayman , Yonathan Aflalo , Asaf Noy , Rong Jin , Lihi Zelnik-Manor

DIMS: Distributed Index for Similarity Search in Metric Spaces

Similarity search finds objects that are similar to a given query object based on a similarity metric. As the amount and variety of data continue to grow, similarity search in metric spaces has gained significant attention. Metric spaces…

Databases · Computer Science 2024-10-08 Yifan Zhu , Chengyang Luo , Tang Qian , Lu Chen , Yunjun Gao , Baihua Zheng

Distributed Many-to-Many Protein Sequence Alignment using Sparse Matrices

Identifying similar protein sequences is a core step in many computational biology pipelines such as detection of homologous protein sequences, generation of similarity protein graphs for downstream analysis, functional annotation and gene…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-01 Oguz Selvitopi , Saliya Ekanayake , Giulia Guidi , Georgios Pavlopoulos , Ariful Azad , Aydin Buluc

An Adaptive Contrastive Learning Model for Spike Sorting

Brain-computer interfaces (BCIs), is ways for electronic devices to communicate directly with the brain. For most medical-type brain-computer interface tasks, the activity of multiple units of neurons or local field potentials is sufficient…

Machine Learning · Computer Science 2022-05-25 Lang Qian , Shengjie Zheng , Chunshan Deng , Cheng Yang , Xiaojian Li

De novo genomic analyses for non-model organisms: an evaluation of methods across a multi-species data set

High-throughput sequencing (HTS) is revolutionizing biological research by enabling scientists to quickly and cheaply query variation at a genomic scale. Despite the increasing ease of obtaining such data, using these data effectively still…

Genomics · Quantitative Biology 2012-11-09 Sonal Singhal

Finding Interpretable Class-Specific Patterns through Efficient Neural Search

Discovering patterns in data that best describe the differences between classes allows to hypothesize and reason about class-specific mechanisms. In molecular biology, for example, this bears promise of advancing the understanding of…

Machine Learning · Computer Science 2023-12-08 Nils Philipp Walter , Jonas Fischer , Jilles Vreeken