Related papers: Correcting Illumina sequencing errors for human da…
We introduce an improved version of RECKONER, an error corrector for Illumina whole genome sequencing data. By modifying its workflow we reduce the computation time even 10 times. We also propose a new method of determination of $k$-mer…
Motivation: Illumina Sequencing data can provide high coverage of a genome by relatively short (100 bp150 bp) reads at a low cost. Our goal is to produce trimmed and error-corrected reads to improve genome assemblies. Our error correction…
The advent of DNA and RNA sequencing has revolutionized the study of genomics and molecular biology. Next generation sequencing (NGS) technologies like Illumina, Ion Torrent, SOLiD sequencing etc. have brought about a quick and cheap way to…
The high throughput and cost-effectiveness afforded by short-read sequencing technologies, in principle, enable researchers to perform 16S rRNA profiling of complex microbial communities at unprecedented depth and resolution. Existing…
Genomic data I used in many fields but, it has become known that most of the platforms used in the sequencing process produce significant errors. This means that the analysis and inferences generated from these data may have some errors…
High-throughput shotgun sequence data makes it possible in principle to accurately estimate population genetic parameters without confounding by SNP ascertainment bias. One such statistic of interest is the proportion of heterozygous sites…
Error correction of sequenced reads remains a difficult task, especially in single-cell sequencing projects with extremely non-uniform coverage. While existing error correction tools designed for standard (multi-cell) sequencing data…
Deep shotgun sequencing and analysis of genomes, transcriptomes, amplified single-cell genomes, and metagenomes has enabled investigation of a wide range of organisms and ecosystems. However, sampling variation in short-read data sets and…
Motivation: Next-generation sequencing tools have enabled producing of huge amount of genomic information at low cost. Unfortunately, presence of sequencing errors in such data affects quality of downstream analyzes. Accuracy of them can be…
The quality of finetuning data is crucial for aligning large language models (LLMs) with human values. Current methods to improve data quality are either labor-intensive or prone to factual errors caused by LLM hallucinations. This paper…
Motivation: Next generation methods of DNA sequencing produce relatively high rate of reading errors, which interfere with de novo genome assembly of newly sequenced organisms and particularly affect the quality of SNP detection important…
Adequate read filtering is critical when processing high-throughput data in marker-gene-based studies. Sequencing errors can cause the mis-clustering of otherwise similar reads, artificially increasing the number of retrieved Operational…
Machine learning is attracting surging interest across nearly all scientific areas by enabling the analysis of large datasets and the extraction of scientific information from incomplete data. Data-driven science is rapidly growing,…
Data gaps are ubiquitous in spectral irradiance data, and yet, little effort has been put into finding robust methods for filling them. We introduce a data-adaptive and nonparametric method that allows us to fill data gaps in…
Faithfully correcting factual errors is critical for maintaining the integrity of textual knowledge bases and preventing hallucinations in sequence-to-sequence models. Drawing on humans' ability to identify and correct factual errors, we…
We introduce HoloClean, a framework for holistic data repairing driven by probabilistic inference. HoloClean unifies existing qualitative data repairing approaches, which rely on integrity constraints or external data sources, with…
Generative models are prone to hallucinations: plausible but incorrect structures absent in the ground truth. This issue is problematic in image restoration for safety-critical domains such as medical imaging, industrial inspection, and…
Blind image deblurring plays a very important role in many vision and multimedia applications. Most existing works tend to introduce complex priors to estimate the sharp image structures for blur kernel estimation. However, it has been…
Ancient mitochondrial DNA has been used in a wide variety of palaeontological and archaeological studies, ranging from population dynamics of extinct species to patterns of domestication. Most of these studies have traditionally been based…
Reconstructing the unknown spectrum of a given X-ray source is a common problem in a wide range of X-ray imaging tasks. For high-energy sources, transmission measurements are mostly used to recover the X-ray spectrum, as a solution to an…