Related papers: Distributed Representations for Biological Sequenc…

Breaking the Euclidean Barrier: Hyperboloid-Based Biological Sequence Analysis

Genomic sequence analysis plays a crucial role in various scientific and medical domains. Traditional machine-learning approaches often struggle to capture the complex relationships and hierarchical structures of sequence data when working…

Machine Learning · Computer Science 2025-10-02 Sarwan Ali , Haris Mansoor , Murray Patterson

BioSequence2Vec: Efficient Embedding Generation For Biological Sequences

Representation learning is an important step in the machine learning pipeline. Given the current biological sequencing data volume, learning an explicit representation is prohibitive due to the dimensionality of the resulting feature…

Machine Learning · Computer Science 2023-04-04 Sarwan Ali , Usama Sardar , Murray Patterson , Imdad Ullah Khan

Learning protein sequence embeddings using information from structure

Inferring the structural properties of a protein from its amino acid sequence is a challenging yet important problem in biology. Structures are not known for the vast majority of protein sequences, but structure is critical for…

Machine Learning · Computer Science 2019-10-17 Tristan Bepler , Bonnie Berger

Vector Embeddings by Sequence Similarity and Context for Improved Compression, Similarity Search, Clustering, Organization, and Manipulation of cDNA Libraries

This paper demonstrates the utility of organized numerical representations of genes in research involving flat string gene formats (i.e., FASTA/FASTQ5). FASTA/FASTQ files have several current limitations, such as their large file sizes,…

Genomics · Quantitative Biology 2023-08-11 Daniel H. Um , David A. Knowles , Gail E. Kaiser

edge2vec: Representation learning using edge semantics for biomedical knowledge discovery

Representation learning provides new and powerful graph analytical approaches and tools for the highly valued data science challenge of mining knowledge graphs. Since previous graph analytical methods have mostly focused on homogeneous…

Information Retrieval · Computer Science 2019-05-29 Zheng Gao , Gang Fu , Chunping Ouyang , Satoshi Tsutsui , Xiaozhong Liu , Jeremy Yang , Christopher Gessner , Brian Foote , David Wild , Qi Yu , Ying Ding

voxel2vec: A Natural Language Processing Approach to Learning Distributed Representations for Scientific Data

Relationships in scientific data, such as the numerical and spatial distribution relations of features in univariate data, the scalar-value combinations' relations in multivariate data, and the association of volumes in time-varying and…

Machine Learning · Computer Science 2022-07-25 Xiangyang He , Yubo Tao , Shuoliu Yang , Haoran Dai , Hai Lin

Distributed Representation of Subgraphs

Network embeddings have become very popular in learning effective feature representations of networks. Motivated by the recent successes of embeddings in natural language processing, researchers have tried to find network embeddings in…

Social and Information Networks · Computer Science 2017-02-23 Bijaya Adhikari , Yao Zhang , Naren Ramakrishnan , B. Aditya Prakash

A biological sequence comparison algorithm using quantum computers

Genetic information is encoded in a linear sequence of nucleotides, represented by letters ranging from thousands to billions. Mutations refer to changes in the DNA or RNA nucleotide sequence. Thus, mutation detection is vital in all areas…

Quantum Physics · Physics 2024-03-14 Büsra Kösoglu-Kind , Robert Loredo , Michele Grossi , Christian Bernecker , Jody M Burks , Rudiger Buchkremer

BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale

Capturing the semantics of related biological concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association…

Computation and Language · Computer Science 2020-07-01 Qingyu Chen , Kyubum Lee , Shankai Yan , Sun Kim , Chih-Hsuan Wei , Zhiyong Lu

Comparative Visual Analytics for Assessing Medical Records with Sequence Embedding

Machine learning for data-driven diagnosis has been actively studied in medicine to provide better healthcare. Supporting analysis of a patient cohort similar to a patient under treatment is a key task for clinicians to make decisions with…

Medical Physics · Physics 2020-03-25 Rongchen Guo , Takanori Fujiwara , Yiran Li , Kelly M. Lima , Soman Sen , Nam K. Tran , Kwan-Liu Ma

graph2vec: Learning Distributed Representations of Graphs

Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph…

Artificial Intelligence · Computer Science 2017-07-18 Annamalai Narayanan , Mahinthan Chandramohan , Rajasekar Venkatesan , Lihui Chen , Yang Liu , Shantanu Jaiswal

Hyperbolic Multimodal Representation Learning for Biological Taxonomies

Taxonomic classification in biodiversity research involves organizing biological specimens into structured hierarchies based on evidence, which can come from multiple modalities such as images and genetic information. We investigate whether…

Machine Learning · Computer Science 2025-08-26 ZeMing Gong , Chuanqi Tang , Xiaoliang Huo , Nicholas Pellegrino , Austin T. Wang , Graham W. Taylor , Angel X. Chang , Scott C. Lowe , Joakim Bruslund Haurum

Beyond Accuracy: Measuring Representation Capacity of Embeddings to Preserve Structural and Contextual Information

Effective representation of data is crucial in various machine learning tasks, as it captures the underlying structure and context of the data. Embeddings have emerged as a powerful technique for data representation, but evaluating their…

Machine Learning · Computer Science 2023-09-21 Sarwan Ali

Unaligned Sequence Similarity Search Using Deep Learning

Gene annotation has traditionally required direct comparison of DNA sequences between an unknown gene and a database of known ones using string comparison methods. However, these methods do not provide useful information when a gene does…

Machine Learning · Computer Science 2019-09-17 James K. Senter , Taylor M. Royalty , Andrew D. Steen , Amir Sadovnik

Event2Vec: A Geometric Approach to Learning Composable Representations of Event Sequences

The study of neural representations, both in biological and artificial systems, is increasingly revealing the importance of geometric and topological structures. Inspired by this, we introduce Event2Vec, a novel framework for learning…

Machine Learning · Computer Science 2025-12-02 Antonin Sulc

Metagenome2Vec: Building Contextualized Representations for Scalable Metagenome Analysis

Advances in next-generation metagenome sequencing have the potential to revolutionize the point-of-care diagnosis of novel pathogen infections, which could help prevent potential widespread transmission of diseases. Given the high volume of…

Genomics · Quantitative Biology 2021-11-17 Sathyanarayanan N. Aakur , Vineela Indla , Vennela Indla , Sai Narayanan , Arunkumar Bagavathi , Vishalini Laguduva Ramnath , Akhilesh Ramachandran

Neural Distance Embeddings for Biological Sequences

The development of data-dependent heuristics and representations for biological sequences that reflect their evolutionary distance is critical for large-scale biological research. However, popular machine learning approaches, based on…

Quantitative Methods · Quantitative Biology 2021-10-13 Gabriele Corso , Rex Ying , Michal Pándy , Petar Veličković , Jure Leskovec , Pietro Liò

Embedding Biomedical Ontologies by Jointly Encoding Network Structure and Textual Node Descriptors

Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other…

Artificial Intelligence · Computer Science 2019-06-21 Sotiris Kotitsas , Dimitris Pappas , Ion Androutsopoulos , Ryan McDonald , Marianna Apidianaki

Neural Embeddings for Protein Graphs

Proteins perform much of the work in living organisms, and consequently the development of efficient computational methods for protein representation is essential for advancing large-scale biological research. Most current approaches…

Quantitative Methods · Quantitative Biology 2023-06-09 Francesco Ceccarelli , Lorenzo Giusti , Sean B. Holden , Pietro Liò

SEEC: Semantic Vector Federation across Edge Computing Environments

Semantic vector embedding techniques have proven useful in learning semantic representations of data across multiple domains. A key application enabled by such techniques is the ability to measure semantic similarity between given data…

Computation and Language · Computer Science 2020-09-01 Shalisha Witherspoon , Dean Steuer , Graham Bent , Nirmit Desai