English
Related papers

Related papers: Benchmarking database performance for genomic data

200 papers

The problem of fast items retrieval from a fixed collection is often encountered in most computer science areas, from operating system components to databases and user interfaces. We present an approach based on hash tables that focuses on…

Neural and Evolutionary Computing · Computer Science 2020-07-17 Dan Domnita , Ciprian Oprisa

Today's sequencing technology allows sequencing an individual genome within a few weeks for a fraction of the costs of the original Human Genome project. Genomics labs are faced with dozens of TB of data per week that have to be…

Databases · Computer Science 2009-09-15 Uwe Roehm , Jose Blakeley

Identification of functional elements of a genome often requires dividing a sequence of measurements along a genome into segments differing from adjacent segments. In many applications, the mean of the measured values at multiple genomic…

Applications · Statistics 2015-06-30 S. B. Girimurugan , Jonathan Dennis , Jinfeng Zhang

The two most common data-structures for genome indexing, FM-indices and hash-tables, exhibit a fundamental trade-off between memory footprint and performance. We present Ranger, a new indexing technique for nucleotide sequences that is both…

Data Structures and Algorithms · Computer Science 2023-08-09 Alon Rashelbach , Ori Rottensterich , Mark Silberstien

Research on the localization of the genetic basis associated with diseases or traits has been widely conducted in the last a few decades. Scan methods have been developed for region-based analysis in whole-genome association studies,…

Methodology · Statistics 2024-10-31 Wei Zhang , Fan Wang , Fang Yao

A method to search for local structural similarities in proteins at atomic resolution is presented. It is demonstrated that a huge amount of structural data can be handled within a reasonable CPU time by using a conventional relational…

Biomolecules · Quantitative Biology 2007-12-28 Akira R. Kinjo , Haruki Nakamura

An applied problem facing all areas of data science is harmonizing data sources. Joining data from multiple origins with unmapped and only partially overlapping features is a prerequisite to developing and testing robust, generalizable…

Modern biological science produces vast amounts of genomic sequence data. This is fuelling the need for efficient algorithms for sequence compression and analysis. Data compression and the associated techniques coming from information…

Data Structures and Algorithms · Computer Science 2011-09-05 Heba Afify , Muhammad Islam , Manal Abdel Wahed

Protein function prediction is a crucial task in bioinformatics, with significant implications for understanding biological processes and disease mechanisms. While the relationship between sequence and function has been extensively…

Quantitative Methods · Quantitative Biology 2024-09-04 Shania Mitra , Lei Huang , Manolis Kellis

Genomic data sets are growing dramatically as the cost of sequencing continues to decline and small sequencing devices become available. Enormous community databases store and share this data with the research community, but some of these…

Rapidly growing public gene expression databases contain a wealth of data for building an unprecedentedly detailed picture of human biology and disease. This data comes from many diverse measurement platforms that make integrating it all…

Genomics · Quantitative Biology 2014-10-16 Karolis Uziela , Antti Honkela

Background: Significance analysis plays a major role in identifying and ranking genes, transcription factor binding sites, DNA methylation regions, and other high-throughput features for association with disease. We propose a new approach,…

Methodology · Statistics 2017-01-10 Andrew E. Jaffe , John D. Storey , Hongkai Ji , Jeffrey T. Leek

We describe a new algorithm and R package for peak detection in genomic data sets using constrained changepoint algorithms. These detect changes from background to peak regions by imposing the constraint that the mean should alternately…

Computation · Statistics 2018-10-02 Toby Dylan Hocking , Guillem Rigaill , Paul Fearnhead , Guillaume Bourque

Graph representation of structured data can facilitate the extraction of stereoscopic features, and it has demonstrated excellent ability when working with deep learning systems, the so-called Graph Neural Networks (GNNs). Choosing a…

Machine Learning · Computer Science 2021-01-27 Yingfang Yuan , Wenjun Wang , George M. Coghill , Wei Pang

Several modern genomic technologies, such as DNA-Methylation arrays, measure spatially registered probes that number in the hundreds of thousands across multiplechromosomes. The measured probes are by themselves less interesting…

Applications · Statistics 2016-11-16 John Nagorski , Genevera I. Allen

Genome assembly is a prominent problem studied in bioinformatics, which computes the source string using a set of its overlapping substrings. Classically, genome assembly uses assembly graphs built using this set of substrings to compute…

Data Structures and Algorithms · Computer Science 2024-09-24 Saumya Talera , Parth Bansal , Shabnam Khan , Shahbaz Khan

In the deeply interconnected world we live in, pieces of information link domains all around us. As graph databases embrace effectively relationships among data and allow processing and querying these connections efficiently, they are…

Databases · Computer Science 2023-04-04 Sergio Abriola , Santiago Cifuentes , María Vanina Martínez , Nina Pardal , Edwin Pin

Data analysis often involves comparing subsets of data across many dimensions for finding unusual trends and patterns. While the comparison between subsets of data can be expressed using SQL, they tend to be complex to write, and suffer…

Databases · Computer Science 2021-07-28 Tarique Siddiqui , Surajit Chaudhuri , Vivek Narasayya

Genome sequence analysis plays a pivotal role in enabling many medical and scientific advancements in personalized medicine, outbreak tracing, and forensics. However, the analysis of genome sequencing data is currently bottlenecked by the…

Hardware Architecture · Computer Science 2021-11-04 Damla Senol Cali

Choosing and developing performant database solutions helps organizations optimize their operational practices and decision-making. Since graph data is becoming more common, it is crucial to develop and use them in big data with complex…

Databases · Computer Science 2024-02-01 Johan Sandell , Einar Asplund , Workneh Yilma Ayele , Martin Duneld
‹ Prev 1 2 3 10 Next ›