Related papers: Self-Augmented Multi-Modal Feature Embedding

Learning Multimodal Data Augmentation in Feature Space

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the…

Machine Learning · Computer Science 2023-04-25 Zichang Liu , Zhiqiang Tang , Xingjian Shi , Aston Zhang , Mu Li , Anshumali Shrivastava , Andrew Gordon Wilson

Deep Multi-Modal Sets

Many vision-related tasks benefit from reasoning over multiple modalities to leverage complementary views of data in an attempt to learn robust embedding spaces. Most deep learning-based methods rely on a late fusion technique whereby…

Computer Vision and Pattern Recognition · Computer Science 2020-03-04 Austin Reiter , Menglin Jia , Pu Yang , Ser-Nam Lim

Multimodal Representation Learning and Fusion

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

On the Effects of Knowledge-Augmented Data in Word Embeddings

This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and…

Computation and Language · Computer Science 2020-10-06 Diego Ramirez-Echavarria , Antonis Bikakis , Luke Dickens , Rob Miller , Andreas Vlachidis

Dense Multimodal Fusion for Hierarchically Joint Representation

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Hence, it is highly expected to learn effective joint representation by fusing the features of different modalities.…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Di Hu , Feiping Nie , Xuelong Li

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions…

Machine Learning · Computer Science 2024-06-19 Benjamin Coleman , Wang-Cheng Kang , Matthew Fahrbach , Ruoxi Wang , Lichan Hong , Ed H. Chi , Derek Zhiyuan Cheng

Hierarchical Data Representation Model - Multi-layer NMF

In this paper, we propose a data representation model that demonstrates hierarchical feature learning using nsNMF. We extend unit algorithm into several layers. Experiments with document and image data successfully discovered feature…

Machine Learning · Computer Science 2013-03-19 Hyun Ah Song , Soo-Young Lee

Morphological Word Embeddings

Linguistic similarity is multi-faceted. For instance, two words may be similar with respect to semantics, syntax, or morphology inter alia. Continuous word-embeddings have been shown to capture most of these shades of similarity to some…

Computation and Language · Computer Science 2019-07-05 Ryan Cotterell , Hinrich Schütze

Learning Semantic-Aligned Feature Representation for Text-based Person Search

Text-based person search aims to retrieve images of a certain pedestrian by a textual description. The key challenge of this task is to eliminate the inter-modality gap and achieve the feature alignment across modalities. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2021-12-14 Shiping Li , Min Cao , Min Zhang

UniMat: Unifying Materials Embeddings through Multi-modal Learning

Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal…

Machine Learning · Computer Science 2024-11-14 Janghoon Ock , Joseph Montoya , Daniel Schweigert , Linda Hung , Santosh K. Suram , Weike Ye

Survey on Deep Multi-modal Data Analytics: Collaboration, Rivalry and Fusion

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Yang Wang

Multimodal Representation Learning With Text and Images

In recent years, multimodal AI has seen an upward trend as researchers are integrating data of different types such as text, images, speech into modelling to get the best results. This project leverages multimodal AI and matrix…

Machine Learning · Computer Science 2022-05-03 Aishwarya Jayagopal , Ankireddy Monica Aiswarya , Ankita Garg , Srinivasan Kolumam Nandakumar

ID Embedding as Subtle Features of Content and Structure for Multimodal Recommendation

Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance…

Information Retrieval · Computer Science 2024-05-24 Yuting Liu , Enneng Yang , Yizhou Dang , Guibing Guo , Qiang Liu , Yuliang Liang , Linying Jiang , Xingwei Wang

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

Multimodal recommendation has emerged as a mainstream paradigm, typically leveraging text and visual embeddings extracted from pre-trained models such as Sentence-BERT, Vision Transformers, and ResNet. This approach is founded on the…

Information Retrieval · Computer Science 2026-01-19 Yu Ye , Junchen Fu , Yu Song , Kaiwen Zheng , Joemon M. Jose

Multiplex Word Embeddings for Selectional Preference Acquisition

Conventional word embeddings represent words with fixed vectors, which are usually trained based on co-occurrence patterns among words. In doing so, however, the power of such representations is limited, where the same word might be…

Computation and Language · Computer Science 2020-01-10 Hongming Zhang , Jiaxin Bai , Yan Song , Kun Xu , Changlong Yu , Yangqiu Song , Wilfred Ng , Dong Yu

Exploring Data Augmentations on Self-/Semi-/Fully- Supervised Pre-trained Models

Data augmentation has become a standard component of vision pre-trained models to capture the invariance between augmented views. In practice, augmentation techniques that mask regions of a sample with zero/mean values or patches from other…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Shentong Mo , Zhun Sun , Chao Li

Using Person Embedding to Enrich Features and Data Augmentation for Classification

Today, machine learning is applied in almost any field. In machine learning, where there are numerous methods, classification is one of the most basic and crucial ones. Various problems can be solved by classification. The feature selection…

Machine Learning · Computer Science 2022-07-01 Ahmet Tuğrul Bayrak

Enhancing Representation in Medical Vision-Language Foundation Models via Multi-Scale Information Extraction Techniques

The development of medical vision-language foundation models has attracted significant attention in the field of medicine and healthcare due to their promising prospect in various clinical applications. While previous studies have commonly…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Weijian Huang , Cheng Li , Hong-Yu Zhou , Jiarun Liu , Hao Yang , Yong Liang , Guangming Shi , Hairong Zheng , Shanshan Wang

Learning Meta-Embeddings by Using Ensembles of Embedding Sets

Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured…

Computation and Language · Computer Science 2015-12-31 Wenpeng Yin , Hinrich Schütze

Zoom and Shift are All You Need

Feature alignment serves as the primary mechanism for fusing multimodal data. We put forth a feature alignment approach that achieves full integration of multimodal information. This is accomplished via an alternating process of shifting…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Jiahao Qin