Related papers: Celer: an Efficient Program for Genotype Eliminati…
Whole and targeted sequencing of human genomes is a promising, increasingly feasible tool for discovering genetic contributions to risk of complex diseases. A key step is calling an individual's genotype from the multiple aligned short read…
Pedigrees, or family trees, are graphs of family relationships that are used to study inheritance. A fundamental problem in computational biology is to find, for a pedigree with $n$ individuals genotyped at every site, a set of…
Formal methods apply algorithms based on mathematical principles to enhance the reliability of systems. It would only be natural to try to progress from verification, model checking or testing a system against its formal specification into…
Here we present mendelFix, a Perl script for checking Mendelian errors in genome-wide SNP data of trio designs. The program takes 12-recoded PLINK PED and MAP files as input to calculate a series of summary statistics for Mendelian errors,…
Genotyping errors are known to influence the power of both family-based and case-control studies in the genetics of complex disease. Estimating genotyping error rate in a given dataset can be complex, but when family information is…
In an extant population, how much information do extant individuals provide on the pedigree of their ancestors? Recent work by Kim, Mossel, Ramnarayan and Turner (2020) studied this question under a number of simplifying assumptions,…
Pedigree GWAS (Option 29) in the current version of the Mendel software is an optimized subroutine for performing large scale genome-wide QTL analysis. This analysis (a) works for random sample data, pedigree data, or a mix of both, (b) is…
We introduce a new algorithm called {\sc Rec-Gen} for reconstructing the genealogy or \textit{pedigree} of an extant population purely from its genetic data. We justify our approach by giving a mathematical proof of the effectiveness of…
Working with exhaustive search on large dataset is infeasible for several reasons. Recently, developed techniques that made pattern set mining feasible by a general solver with long execution time that supports heuristic search and are…
Reconciling gene trees with a species tree is a fundamental problem to understand the evolution of gene families. Many existing approaches reconcile each gene tree independently. However, it is well-known that the evolution of gene families…
Gene set analysis, a popular approach for analyzing high-throughput gene expression data, aims to identify sets of related genes that show significantly enriched or depleted expression patterns between different conditions. In the last…
Genome-wide association studies generate very large datasets that require scalable analysis algorithms. In this report we describe the GEDI software package, which implements efficient algorithms for performing several common tasks in the…
Family history is a major risk factor for many types of cancer. Mendelian risk prediction models translate family histories into cancer risk predictions based on knowledge of cancer susceptibility genes. These models are widely used in…
The aim of plant breeding trials is often to identify germplasms that are well adapt to target environments. These germplasms are identified through genomic prediction from the analysis of multi-environmental field trial (MET) using linear…
Reconciling a gene tree with a species tree is an important task that reveals much about the evolution of genes, genomes, and species, as well as about the molecular function of genes. A wide array of computational tools have been devised…
Genomic data I used in many fields but, it has become known that most of the platforms used in the sequencing process produce significant errors. This means that the analysis and inferences generated from these data may have some errors…
Addressing the reproducibility crisis in artificial intelligence through the validation of reported experimental results is a challenging task. It necessitates either the reimplementation of techniques or a meticulous assessment of papers…
The genetic code structure into distinct multiplet-classes as well as the numeric degeneracies of the latter are revealed by a two-step process. First, an empirical inventory of the degeneracies (of the shuffled multiplets) in two specific…
Statistical analysis of DNA mixtures is known to pose computational challenges due to the enormous state space of possible DNA profiles. We propose a Bayesian network representation for genotypes, allowing computations to be performed…
A central problem in data integration and data cleansing is to find entities in different data sources that describe the same real-world object. Many existing methods for identifying such entities rely on explicit linkage rules which…