Related papers: Benchmarking database performance for genomic data

A Genetic Algorithm for Obtaining Memory Constrained Near-Perfect Hashing

The problem of fast items retrieval from a fixed collection is often encountered in most computer science areas, from operating system components to databases and user interfaces. We present an approach based on hash tables that focuses on…

Neural and Evolutionary Computing · Computer Science 2020-07-17 Dan Domnita , Ciprian Oprisa

Data Management for High-Throughput Genomics

Today's sequencing technology allows sequencing an individual genome within a few weeks for a fraction of the costs of the original Human Genome project. Genomics labs are faced with dozens of TB of data per week that have to be…

Databases · Computer Science 2009-09-15 Uwe Roehm , Jose Blakeley

iSeg: an algorithm for segmentation of genomic data

Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments differing from adjacent segments. In many applications, the mean of the measured values at multiple genomic…

Applications · Statistics 2015-06-30 S. B. Girimurugan , Jonathan Dennis , Jinfeng Zhang

Nucleotide String Indexing using Range Matching

The two most common data-structures for genome indexing, FM-indices and hash-tables, exhibit a fundamental trade-off between memory footprint and performance. We present Ranger, a new indexing technique for nucleotide sequences that is both…

Data Structures and Algorithms · Computer Science 2023-08-09 Alon Rashelbach , Ori Rottensterich , Mark Silberstien

Fast Signal Region Detection with Application to Whole Genome Association Studies

Research on the localization of the genetic basis associated with diseases or traits has been widely conducted in the last a few decades. Scan methods have been developed for region-based analysis in whole-genome association studies,…

Methodology · Statistics 2024-10-31 Wei Zhang , Fan Wang , Fang Yao

A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational…

Biomolecules · Quantitative Biology 2007-12-28 Akira R. Kinjo , Haruki Nakamura

Deep Learning to Jointly Schema Match, Impute, and Transform Databases

An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable…

Databases · Computer Science 2022-07-11 Sandhya Tripathi , Bradley A. Fritz , Mohamed Abdelhack , Michael S. Avidan , Yixin Chen , Christopher R. King

DNA Lossless Differential Compression Algorithm based on Similarity of Genomic Sequence Database

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information…

Data Structures and Algorithms · Computer Science 2011-09-05 Heba Afify , Muhammad Islam , Manal Abdel Wahed

ProteinRPN: Towards Accurate Protein Function Prediction with Graph-Based Region Proposals

Protein function prediction is a crucial task in bioinformatics, with significant implications for understanding biological processes and disease mechanisms. While the relationship between sequence and function has been extensively…

Quantitative Methods · Quantitative Biology 2024-09-04 Shania Mitra , Lei Huang , Manolis Kellis

The Parallelism Motifs of Genomic Data Analysis

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-08 Katherine Yelick , Aydin Buluc , Muaaz Awan , Ariful Azad , Benjamin Brock , Rob Egan , Saliya Ekanayake , Marquita Ellis , Evangelos Georganas , Giulia Guidi , Steven Hofmeyr , Oguz Selvitopi , Cristina Teodoropol , Leonid Oliker

Probe region expression estimation for RNA-seq data for improved microarray comparability

Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all…

Genomics · Quantitative Biology 2014-10-16 Karolis Uziela , Antti Honkela

Gene set bagging for estimating replicability of gene set analyses

Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features for association with disease. We propose a new approach,…

Methodology · Statistics 2017-01-10 Andrew E. Jaffe , John D. Storey , Hongkai Ji , Jeffrey T. Leek

Generalized Functional Pruning Optimal Partitioning (GFPOP) for Constrained Changepoint Detection in Genomic Data

We describe a new algorithm and R package for peak detection in genomic data sets using constrained changepoint algorithms. These detect changes from background to peak regions by imposing the constraint that the mean should alternately…

Computation · Statistics 2018-10-02 Toby Dylan Hocking , Guillem Rigaill , Paul Fearnhead , Guillaume Bourque

A Novel Genetic Algorithm with Hierarchical Evaluation Strategy for Hyperparameter Optimisation of Graph Neural Networks

Graph representation of structured data can facilitate the extraction of stereoscopic features, and it has demonstrated excellent ability when working with deep learning systems, the so-called Graph Neural Networks (GNNs). Choosing a…

Machine Learning · Computer Science 2021-01-27 Yingfang Yuan , Wenjun Wang , George M. Coghill , Wei Pang

Genomic Region Detection via Spatial Convex Clustering

Several modern genomic technologies, such as DNA-Methylation arrays, measure spatially registered probes that number in the hundreds of thousands across multiplechromosomes. The measured probes are by themselves less interesting…

Applications · Statistics 2016-11-16 John Nagorski , Genevera I. Allen

Practical algorithms for Hierarchical overlap graphs

Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute…

Data Structures and Algorithms · Computer Science 2024-09-24 Saumya Talera , Parth Bansal , Shabnam Khan , Shahbaz Khan

On the complexity of finding set repairs for data-graphs

In the deeply interconnected world we live in, pieces of information link domains all around us. As graph databases embrace effectively relationships among data and allow processing and querying these connections efficiently, they are…

Databases · Computer Science 2023-04-04 Sergio Abriola , Santiago Cifuentes , María Vanina Martínez , Nina Pardal , Edwin Pin

COMPARE: Accelerating Groupwise Comparison in Relational Databases for Data Analytics

Data analysis often involves comparing subsets of data across many dimensions for finding unusual trends and patterns. While the comparison between subsets of data can be expressed using SQL, they tend to be complex to write, and suffer…

Databases · Computer Science 2021-07-28 Tarique Siddiqui , Surajit Chaudhuri , Vivek Narasayya

Accelerating Genome Sequence Analysis via Efficient Hardware/Algorithm Co-Design

Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, and forensics. However, the analysis of genome sequencing data is currently bottlenecked by the…

Hardware Architecture · Computer Science 2021-11-04 Damla Senol Cali

Performance Comparison Analysis of ArangoDB, MySQL, and Neo4j: An Experimental Study of Querying Connected Data

Choosing and developing performant database solutions helps organizations optimize their operational practices and decision-making. Since graph data is becoming more common, it is crucial to develop and use them in big data with complex…

Databases · Computer Science 2024-02-01 Johan Sandell , Einar Asplund , Workneh Yilma Ayele , Martin Duneld