Related papers: Unaligned Sequence Similarity Search Using Deep Le…
Discovering fine-grained categories from coarsely labeled data is a practical and challenging task, which can bridge the gap between the demand for fine-grained analysis and the high annotation cost. Previous works mainly focus on…
DNA sequence alignment is important today as it is usually the first step in finding gene mutation, evolutionary similarities, protein structure, drug development and cancer treatment. Covid-19 is one recent example. There are many…
Network alignment, in general, seeks to discover the hidden underlying correspondence between nodes across two (or more) networks when given their network structure. However, most existing network alignment methods have added assumptions of…
DNA sequence alignment involves assigning short DNA reads to the most probable locations on an extensive reference genome. This process is crucial for various genomic analyses, including variant calling, transcriptomics, and epigenomics.…
Data clustering, the task of grouping observations according to their similarity, is a key component of unsupervised learning -- with real world applications in diverse fields such as biology, medicine, and social science. Often in these…
This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ5). FASTA/FASTQ files have several current limitations, such as their large file sizes,…
Knowledge graphs are freely aggregated, published, and edited in the Web of data, and thus may overlap. Hence, a key task resides in aligning (or matching) their content. This task encompasses the identification, within an aggregated…
Mining tasks over sequential data, such as clickstreams and gene sequences, require a careful design of embeddings usable by learning algorithms. Recent research in feature learning has been extended to sequential data, where each instance…
Protein similarity searches are a routine job for molecular biologists where a query sequence of amino acids needs to be compared and ranked against an ever-growing database of proteins. All available algorithms in this field can be grouped…
In recent years, deep metric learning has achieved promising results in learning high dimensional semantic feature embeddings where the spatial relationships of the feature vectors match the visual similarities of the images. Similarity…
One recent research demonstrated successful application of the label alignment property for unsupervised domain adaptation in a linear regression settings. Instead of regularizing representation learning to be domain invariant, the research…
Network embedding is a highly effective method to learn low-dimensional node vector representations with original network structures being well preserved. However, existing network embedding algorithms are mostly developed for a single…
Similarity query is the family of queries based on some similarity metrics. Unlike the traditional database queries which are mostly based on value equality, similarity queries aim to find targets "similar enough to" the given data objects,…
The Basic Local Alignment Search Tool (BLAST) is currently the most popular method for searching databases of biological sequences. BLAST compares sequences via similarity defined by a weighted edit distance, which results in it being…
Deep learning revolution happened thanks to the availability of a massive amount of labelled data which have contributed to the development of models with extraordinary inference capabilities. Despite the public availability of a large…
A key obstacle in automated analytics and meta-learning is the inability to recognize when different datasets contain measurements of the same variable. Because provided attribute labels are often uninformative in practice, this task may be…
Edit-distance-based string similarity search has many applications such as spell correction, data de-duplication, and sequence alignment. However, computing edit distance is known to have high complexity, which makes string similarity…
Given a small set of labeled data and a large set of unlabeled data, semi-supervised learning (SSL) attempts to leverage the location of the unlabeled datapoints in order to create a better classifier than could be obtained from supervised…
Most approaches that tackle the problem of node classification consider nodes to be similar, if they have shared neighbors or are close to each other in the graph. Recent methods for attributed graphs additionally take attributes of…
Motivation: Drug discovery demands rapid quantification of compound-protein interaction (CPI). However, there is a lack of methods that can predict compound-protein affinity from sequences alone with high applicability, accuracy, and…