Related papers: SimPlot++: a Python application for representing s…
We here present SIMLR (Single-cell Interpretation via Multi-kernel LeaRning), an open-source tool that implements a novel framework to learn a sample-to-sample similarity measure from expression data observed for heterogenous samples. SIMLR…
Motivation: Revealing structural variations across sequences of closely related individuals or species is crucial for understanding their diversification mechanisms and roles. Results: We developed PatchWorkPlot, a tool for visualization of…
When estimating a phylogeny from a multiple sequence alignment, researchers often assume the absence of recombination. However, if recombination is present, then tree estimation and all downstream analyses will be impacted, because…
Summary: Accurate phenotype prediction from genomic sequences is a highly coveted task in biological and medical research. While machine-learning holds the key to accurate prediction in a variety of fields, the complexity of biological data…
Polyploidization is an important evolutionary process which affects organisms ranging from plants to fish and fungi. The signal left behind by it is in the form of a species' ploidy level (number of complete chromosome sets found in a cell)…
Genetic information is encoded in a linear sequence of nucleotides, represented by letters ranging from thousands to billions. Mutations refer to changes in the DNA or RNA nucleotide sequence. Thus, mutation detection is vital in all areas…
Motivation: Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of disparate bioinformatics tools. Results: To provide an…
The alignment of biological sequences such as DNA, RNA, and proteins, is one of the basic tools that allow to detect evolutionary patterns, as well as functional/structural characterizations between homologous sequences in different…
Clustering is a difficult and widely-studied data mining task, with many varieties of clustering algorithms proposed in the literature. Nearly all algorithms use a similarity measure such as a distance metric (e.g. Euclidean distance) to…
Process mining techniques including process discovery, conformance checking, and process enhancement provide extensive knowledge about processes. Discovering running processes and deviations as well as detecting performance problems and…
Motivation: Illumina DNA sequencing is now the predominant source of raw genomic data, and data volumes are growing rapidly. Bioinformatic analysis pipelines are having trouble keeping pace. A common bottleneck in such pipelines is the…
In this work, we present scikit-fingerprints, a Python package for computation of molecular fingerprints for applications in chemoinformatics. Our library offers an industry-standard scikit-learn interface, allowing intuitive usage and easy…
forqs is a forward-in-time simulation of recombination, quantitative traits, and selection. It was designed to investigate haplotype patterns resulting from scenarios where substantial evolutionary change has taken place in a small number…
We present an application, Superplot, for calculating and plotting statistical quantities relevant to parameter inference from a "chain" of samples drawn from a parameter space, produced by e.g. MultiNest. A simple graphical interface…
Data clones are defined as multiple copies of the same data among datasets. Presence of data clones between datasets can cause issues such as difficulties in managing data assets and data license violations when using datasets with clones…
Correspondence matching is a fundamental problem in computer vision and robotics applications. Solving correspondence matching problems using neural networks has been on the rise recently. Rotation-equivariance and scale-equivariance are…
Merlin++ is a C++ charged-particle tracking library developed for the simulation and analysis of complex beam dynamics within high energy particle accelerators. Accurate simulation and analysis of particle dynamics is an essential part of…
Multiplexed imaging data are revolutionizing our understanding of the composition and organization of tissues and tumors. A critical aspect of such tissue profiling is quantifying the spatial relationship relationships among cells at…
Due to the high inter-class similarity caused by the complex composition and the co-existing objects across scenes, numerous studies have explored object semantic knowledge within scenes to improve scene recognition. However, a resulting…
Recent advances in computational methods for designing biological sequences have sparked the development of metrics to evaluate these methods performance in terms of the fidelity of the designed sequences to a target distribution and their…