Related papers: Minimum error correction-based haplotype assembly:…

On the Complexity of the Single Individual SNP Haplotyping Problem

We present several new results pertaining to haplotyping. These results concern the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype fragments. We consider the complexity of the…

Genomics · Quantitative Biology 2016-11-17 Rudi Cilibrasi , Leo van Iersel , Steven Kelk , John Tromp

GenHap: A Novel Computational Method Based on Genetic Algorithms for Haplotype Assembly

The computational problem of inferring the full haplotype of a cell starting from read sequencing data is known as haplotype assembly, and consists in assigning all heterozygous Single Nucleotide Polymorphisms (SNPs) to exactly one of the…

Genomics · Quantitative Biology 2018-12-20 Andrea Tangherloni , Simone Spolaor , Leonardo Rundo , Marco S. Nobile , Paolo Cazzaniga , Giancarlo Mauri , Pietro Liò , Ivan Merelli , Daniela Besozzi

On the Complexity of Several Haplotyping Problems

In this paper we present a collection of results pertaining to haplotyping. The first set of results concerns the combinatorial problem of reconstructing haplotypes from incomplete and/or imperfectly sequenced haplotype data. More…

Genomics · Quantitative Biology 2007-05-23 Rudi Cilibrasi , Leo van Iersel , Steven Kelk , John Tromp

Haplotype Assembly: An Information Theoretic View

This paper studies the haplotype assembly problem from an information theoretic perspective. A haplotype is a sequence of nucleotide bases on a chromosome, often conveniently represented by a binary string, that differ from the bases in the…

Information Theory · Computer Science 2014-05-13 Hongbo Si , Haris Vikalo , Sriram Vishwanath

Haplotype Assembly Using Manifold Optimization and Error Correction Mechanism

Recent matrix completion based methods have not been able to properly model the Haplotype Assembly Problem (HAP) for noisy observations. To cope with such a case, in this letter we propose a new Minimum Error Correction (MEC) based matrix…

Optimization and Control · Mathematics 2019-04-16 Mohamad Mahdi Mohades , Sina Majidian , Mohammad Hossein Kahaei

ComHapDet: A Spatial Community Detection Algorithm for Haplotype Assembly

Background: Haplotypes, the ordered lists of single nucleotide variations that distinguish chromosomal sequences from their homologous pairs, may reveal an individual's susceptibility to hereditary and complex diseases and affect how our…

Social and Information Networks · Computer Science 2019-11-28 Abishek Sankararaman , Haris Vikalo , François Baccelli

Optimal Haplotype Assembly from High-Throughput Mate-Pair Reads

Humans have $23$ pairs of homologous chromosomes. The homologous pairs are almost identical pairs of chromosomes. For the most part, differences in homologous chromosome occur at certain documented positions called single nucleotide…

Information Theory · Computer Science 2015-02-09 Govinda M. Kamath , Eren Şaşoğlu , David Tse

Maximum Likelihood Estimation of Frequencies of Known Haplotypes from Pooled Sequence Data

DNA samples are often pooled, either by experimental design, or because the sample itself is a mixture. For example, when population allele frequencies are of primary interest, individual samples may be pooled together to lower the cost of…

Quantitative Methods · Quantitative Biology 2013-02-07 Darren Kessner , Tom Turner , John Novembre

Learning to Match Unpaired Data with Minimum Entropy Coupling

Multimodal data is a precious asset enabling a variety of downstream tasks in machine learning. However, real-world data collected across different modalities is often not paired, which is a significant challenge to learn a joint…

Machine Learning · Computer Science 2025-08-11 Mustapha Bounoua , Giulio Franzese , Pietro Michiardi

Assembly of repetitive regions using next-generation sequencing data

High read depth can be used to assemble short sequence repeats. The existing genome assemblers fail in repetitive regions of longer than average read. I propose a new algorithm for a DNA assembly which uses the relative frequency of reads…

Genomics · Quantitative Biology 2015-01-08 Robert M. Nowak

Hidden Markov models for the assessment of chromosomal alterations using high-throughput SNP arrays

Chromosomal DNA is characterized by variation between individuals at the level of entire chromosomes (e.g., aneuploidy in which the chromosome copy number is altered), segmental changes (including insertions, deletions, inversions, and…

Applications · Statistics 2008-07-30 Robert B. Scharpf , Giovanni Parmigiani , Jonathan Pevsner , Ingo Ruczinski

Matrix Completion with Weighted Constraint for Haplotype Estimation

A new optimization design is proposed for matrix completion by weighting the measurements and deriving the corresponding error bound. Accordingly, the Haplotype reconstruction using nuclear norm minimization with Weighted Constraint…

Signal Processing · Electrical Eng. & Systems 2021-01-13 Sina Majidian , M. Mohades , M. H. Kahaei

GapPredict: A Language Model for Resolving Gaps in Draft Genome Assemblies

Short-read DNA sequencing instruments can yield over 1e+12 bases per run, typically composed of reads 150 bases long. Despite this high throughput, de novo assembly algorithms have difficulty reconstructing contiguous genome sequences using…

Genomics · Quantitative Biology 2023-06-09 Eric Chen , Justin Chu , Jessica Zhang , Rene L. Warren , Inanc Birol

SNP2Vec: Scalable Self-Supervised Pre-Training for Genome-Wide Association Study

Self-supervised pre-training methods have brought remarkable breakthroughs in the understanding of text, image, and speech. Recent developments in genomics has also adopted these pre-training methods for genome understanding. However, they…

Machine Learning · Computer Science 2022-04-15 Samuel Cahyawijaya , Tiezheng Yu , Zihan Liu , Tiffany T. W. Mak , Xiaopu Zhou , Nancy Y. Ip , Pascale Fung

A New Biophysical Metric for Interrogating the Information Content in Human Genome Sequence Variation: Proof of Concept

Various studies have shown an association between single nucleotide polymorphisms (SNPs) and common disease. We hypothesize that information encoded in the structure of SNP haploblock variation illumines molecular pathways and cellular…

Biological Physics · Physics 2011-08-16 James Lindesay , Tshela E Mason , Luisel Ricks-Santi , William Hercules , Philip Kurian , Georgia M Dunston

A QPTAS for Gapless MEC

We consider the problem Minimum Error Correction (MEC). A MEC instance is an n x m matrix M with entries from {0,1,-}. Feasible solutions are composed of two binary m-bit strings, together with an assignment of each row of M to one of the…

Data Structures and Algorithms · Computer Science 2018-05-01 Shilpa Garg , Tobias Mömke

Haplotype Inference on Pedigrees with Recombinations, Errors, and Missing Genotypes via SAT solvers

The Minimum-Recombinant Haplotype Configuration problem (MRHC) has been highly successful in providing a sound combinatorial formulation for the important problem of genotype phasing on pedigrees. Despite several algorithmic advances and…

Data Structures and Algorithms · Computer Science 2013-11-20 Yuri Pirola , Gianluca Della Vedova , Stefano Biffani , Alessandra Stella , Paola Bonizzoni

Correcting a Single Deletion in Reads from a Nanopore Sequencer

Owing to its several merits over other DNA sequencing technologies, nanopore sequencers hold an immense potential to revolutionize the efficiency of DNA storage systems. However, their higher error rates necessitate further research to…

Information Theory · Computer Science 2024-05-08 Anisha Banerjee , Yonatan Yehezkeally , Antonia Wachter-Zeh , Eitan Yaakobi

Computing all-vs-all MEMs in run-length encoded collections of HiFi reads

We describe an algorithm to find maximal exact matches (MEMs) among HiFi reads with homopolymer errors. The main novelty in our work is that we resort to run-length compression to help deal with errors. Our method receives as input a…

Data Structures and Algorithms · Computer Science 2022-09-01 Diego Díaz-Domínguez , Simon J. Puglisi , Leena Salmela

A New Covariance Estimator for Sufficient Dimension Reduction in High-Dimensional and Undersized Sample Problems

The application of standard sufficient dimension reduction methods for reducing the dimension space of predictors without losing regression information requires inverting the covariance matrix of the predictors. This has posed a number of…

Methodology · Statistics 2019-10-01 Kabir Opeyemi Olorede , Waheed Babatunde Yahya