Related papers: LUMPY: A probabilistic framework for structural va…
Structural variants compose the majority of human genetic variation, but are difficult to assess using current genomic sequencing technologies. Optical mapping technologies, which measure the size of chromosomal fragments between labeled…
Significant advances in biotechnology have allowed for simultaneous measurement of molecular data points across multiple genomic and transcriptomic levels from a single tumor/cancer sample. This has motivated systematic approaches to…
Spatial transcriptomics has revolutionized tissue analysis by simultaneously mapping gene expression, spatial topography, and histological context across consecutive tissue sections, enabling systematic investigation of spatial…
The study of genomic variation has provided key insights into the functional role of mutations. Predominantly, studies have focused on single nucleotide variants (SNV), which are relatively easy to detect and can be described with rich…
Semantic segmentation networks (SSNs) are central to safety-critical applications such as medical imaging and autonomous driving, where robustness under uncertainty is essential. However, existing probabilistic verification methods often…
The classification of genetic variants, particularly Variants of Uncertain Significance (VUS), poses a significant challenge in clinical genetics and precision medicine. Large Language Models (LLMs) have emerged as transformative tools in…
In data-driven SHM, the signals recorded from systems in operation can be noisy and incomplete. Data corresponding to each of the operational, environmental, and damage states are rarely available a priori; furthermore, labelling to…
Next-generation sequencing techniques have facilitated a large scale analysis of human genetic variation. Despite the advances in sequencing speeds, the computational discovery of structural variants is not yet standard. It is likely that…
Extracting genetic information from a full range of sequencing data is important for understanding diseases. We propose a novel method to effectively explore the landscape of genetic mutations and aggregate them to predict cancer type. We…
The tremdendous advances in high-throughput sequencing technologies have made population-scale sequencing as performed in the 1000 Genomes project and the Genome of the Netherlands project possible. Next-generation sequencing has allowed…
Motivation: The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection,…
We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering:…
Integrative learning of multiple datasets has the potential to mitigate the challenge of small $n$ and large $p$ that is often encountered in analysis of big biomedical data such as genomics data. Detection of weak yet important signals can…
Variant calling is a fundamental task in genomic research, essential for detecting genetic variations such as single nucleotide polymorphisms (SNPs) and insertions or deletions (indels). This paper presents an enhancement to DeepChem, a…
In this paper, we study randomized methods for feedback design of uncertain systems. The first contribution is to derive the sample complexity of various constrained control problems. In particular, we show the key role played by the…
When applying the support vector machine (SVM) to high-dimensional classification problems, we often impose a sparse structure in the SVM to eliminate the influences of the irrelevant predictors. The lasso and other variable selection…
Inferring the structural properties of a protein from its amino acid sequence is a challenging yet important problem in biology. Structures are not known for the vast majority of protein sequences, but structure is critical for…
Learning structured models using maximum margin techniques has become an indispensable tool for com- puter vision researchers, as many computer vision applications can be cast naturally as an image labeling problem. Pixel-based or…
Background: Several sources of noise obfuscate the identification of single nucleotide variation (SNV) in next generation sequencing data. For instance, errors may be introduced during library construction and sequencing steps. In addition,…
After the completion of human genome sequence was anounced, it is evident that interpretation of DNA sequences is an immediate task to work on. For understanding their signals, improvement of present sequence analysis tools and developing…