English
Related papers

Related papers: Self-Augmented Multi-Modal Feature Embedding

200 papers

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the…

Machine Learning · Computer Science 2023-04-25 Zichang Liu , Zhiqiang Tang , Xingjian Shi , Aston Zhang , Mu Li , Anshumali Shrivastava , Andrew Gordon Wilson

Many vision-related tasks benefit from reasoning over multiple modalities to leverage complementary views of data in an attempt to learn robust embedding spaces. Most deep learning-based methods rely on a late fusion technique whereby…

Computer Vision and Pattern Recognition · Computer Science 2020-03-04 Austin Reiter , Menglin Jia , Pu Yang , Ser-Nam Lim

Multi-modal learning is a fast growing area in artificial intelligence. It tries to help machines understand complex things by combining information from different sources, like images, text, and audio. By using the strengths of each…

Machine Learning · Computer Science 2025-12-22 Qihang Jin , Enze Ge , Yuhang Xie , Hongying Luo , Junhao Song , Ziqian Bi , Chia Xin Liang , Jibin Guan , Joe Yeong , Xinyuan Song , Junfeng Hao

This paper investigates techniques for knowledge injection into word embeddings learned from large corpora of unannotated data. These representations are trained with word cooccurrence statistics and do not commonly exploit syntactic and…

Computation and Language · Computer Science 2020-10-06 Diego Ramirez-Echavarria , Antonis Bikakis , Luke Dickens , Rob Miller , Andreas Vlachidis

Multiple modalities can provide more valuable information than single one by describing the same contents in various ways. Hence, it is highly expected to learn effective joint representation by fusing the features of different modalities.…

Computer Vision and Pattern Recognition · Computer Science 2018-10-09 Di Hu , Feiping Nie , Xuelong Li

Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions…

Machine Learning · Computer Science 2024-06-19 Benjamin Coleman , Wang-Cheng Kang , Matthew Fahrbach , Ruoxi Wang , Lichan Hong , Ed H. Chi , Derek Zhiyuan Cheng

In this paper, we propose a data representation model that demonstrates hierarchical feature learning using nsNMF. We extend unit algorithm into several layers. Experiments with document and image data successfully discovered feature…

Machine Learning · Computer Science 2013-03-19 Hyun Ah Song , Soo-Young Lee

Linguistic similarity is multi-faceted. For instance, two words may be similar with respect to semantics, syntax, or morphology inter alia. Continuous word-embeddings have been shown to capture most of these shades of similarity to some…

Computation and Language · Computer Science 2019-07-05 Ryan Cotterell , Hinrich Schütze

Text-based person search aims to retrieve images of a certain pedestrian by a textual description. The key challenge of this task is to eliminate the inter-modality gap and achieve the feature alignment across modalities. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2021-12-14 Shiping Li , Min Cao , Min Zhang

Materials science datasets are inherently heterogeneous and are available in different modalities such as characterization spectra, atomic structures, microscopic images, and text-based synthesis conditions. The advancements in multi-modal…

Machine Learning · Computer Science 2024-11-14 Janghoon Ock , Joseph Montoya , Daniel Schweigert , Linda Hung , Santosh K. Suram , Weike Ye

With the development of web technology, multi-modal or multi-view data has surged as a major stream for big data, where each modal/view encodes individual property of data objects. Often, different modalities are complementary to each…

Computer Vision and Pattern Recognition · Computer Science 2020-06-16 Yang Wang

In recent years, multimodal AI has seen an upward trend as researchers are integrating data of different types such as text, images, speech into modelling to get the best results. This project leverages multimodal AI and matrix…

Machine Learning · Computer Science 2022-05-03 Aishwarya Jayagopal , Ankireddy Monica Aiswarya , Ankita Garg , Srinivasan Kolumam Nandakumar

Multimodal recommendation aims to model user and item representations comprehensively with the involvement of multimedia content for effective recommendations. Existing research has shown that it is beneficial for recommendation performance…

Information Retrieval · Computer Science 2024-05-24 Yuting Liu , Enneng Yang , Yizhou Dang , Guibing Guo , Qiang Liu , Yuliang Liang , Linying Jiang , Xingwei Wang

Multimodal recommendation has emerged as a mainstream paradigm, typically leveraging text and visual embeddings extracted from pre-trained models such as Sentence-BERT, Vision Transformers, and ResNet. This approach is founded on the…

Information Retrieval · Computer Science 2026-01-19 Yu Ye , Junchen Fu , Yu Song , Kaiwen Zheng , Joemon M. Jose

Conventional word embeddings represent words with fixed vectors, which are usually trained based on co-occurrence patterns among words. In doing so, however, the power of such representations is limited, where the same word might be…

Computation and Language · Computer Science 2020-01-10 Hongming Zhang , Jiaxin Bai , Yan Song , Kun Xu , Changlong Yu , Yangqiu Song , Wilfred Ng , Dong Yu

Data augmentation has become a standard component of vision pre-trained models to capture the invariance between augmented views. In practice, augmentation techniques that mask regions of a sample with zero/mean values or patches from other…

Computer Vision and Pattern Recognition · Computer Science 2023-10-31 Shentong Mo , Zhun Sun , Chao Li

Today, machine learning is applied in almost any field. In machine learning, where there are numerous methods, classification is one of the most basic and crucial ones. Various problems can be solved by classification. The feature selection…

Machine Learning · Computer Science 2022-07-01 Ahmet Tuğrul Bayrak

The development of medical vision-language foundation models has attracted significant attention in the field of medicine and healthcare due to their promising prospect in various clinical applications. While previous studies have commonly…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Weijian Huang , Cheng Li , Hong-Yu Zhou , Jiarun Liu , Hao Yang , Yong Liang , Guangming Shi , Hairong Zheng , Shanshan Wang

Word embeddings -- distributed representations of words -- in deep learning are beneficial for many tasks in natural language processing (NLP). However, different embedding sets vary greatly in quality and characteristics of the captured…

Computation and Language · Computer Science 2015-12-31 Wenpeng Yin , Hinrich Schütze

Feature alignment serves as the primary mechanism for fusing multimodal data. We put forth a feature alignment approach that achieves full integration of multimodal information. This is accomplished via an alternating process of shifting…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Jiahao Qin
‹ Prev 1 2 3 10 Next ›