Related papers: Bundle Optimization for Multi-aspect Embedding

Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence

Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Huy Manh Nguyen , Tomo Miyazaki , Yoshihiro Sugaya , Shinichiro Omachi

Learning Multi-modal Similarity

In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, e.g., nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits…

Artificial Intelligence · Computer Science 2010-09-01 Brian McFee , Gert Lanckriet

Hierarchy-based Image Embeddings for Semantic Image Retrieval

Deep neural networks trained for classification have been found to learn powerful image representations, which are also often used for other tasks such as comparing images w.r.t. their visual similarity. However, visual similarity does not…

Computer Vision and Pattern Recognition · Computer Science 2019-07-24 Björn Barz , Joachim Denzler

Learning Structured Semantic Embeddings for Visual Recognition

Numerous embedding models have been recently explored to incorporate semantic knowledge into visual recognition. Existing methods typically focus on minimizing the distance between the corresponding images and texts in the embedding space…

Computer Vision and Pattern Recognition · Computer Science 2017-06-06 Dong Li , Hsin-Ying Lee , Jia-Bin Huang , Shengjin Wang , Ming-Hsuan Yang

Asymmetric Visual Semantic Embedding Framework for Efficient Vision-Language Alignment

Learning visual semantic similarity is a critical challenge in bridging the gap between images and texts. However, there exist inherent variations between vision and language data, such as information density, i.e., images can contain…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Yang Liu , Mengyuan Liu , Shudong Huang , Jiancheng Lv

A Multi-View Embedding Space for Modeling Internet Images, Tags, and their Semantics

This paper investigates the problem of modeling Internet images and associated text or tags for tasks such as image-to-image search, tag-to-image search, and image-to-tag search (image annotation). We start with canonical correlation…

Computer Vision and Pattern Recognition · Computer Science 2013-09-13 Yunchao Gong , Qifa Ke , Michael Isard , Svetlana Lazebnik

A Survey on Efficient Processing of Similarity Queries over Neural Embeddings

Similarity query is the family of queries based on some similarity metrics. Unlike the traditional database queries which are mostly based on value equality, similarity queries aim to find targets "similar enough to" the given data objects,…

Databases · Computer Science 2022-04-19 Yifan Wang

Learning Semantic-Aligned Feature Representation for Text-based Person Search

Text-based person search aims to retrieve images of a certain pedestrian by a textual description. The key challenge of this task is to eliminate the inter-modality gap and achieve the feature alignment across modalities. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2021-12-14 Shiping Li , Min Cao , Min Zhang

Learning Multimodal Affinities for Textual Editing in Images

Nowadays, as cameras are rapidly adopted in our daily routine, images of documents are becoming both abundant and prevalent. Unlike natural images that capture physical objects, document-images contain a significant amount of text with…

Computer Vision and Pattern Recognition · Computer Science 2021-03-19 Or Perel , Oron Anschel , Omri Ben-Eliezer , Shai Mazor , Hadar Averbuch-Elor

Learning to embed semantic similarity for joint image-text retrieval

We present a deep learning approach for learning the joint semantic embeddings of images and captions in a Euclidean space, such that the semantic similarity is approximated by the L2 distances in the embedding space. For that, we introduce…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Noam Malali , Yosi Keller

Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification

Multi-label image and video classification are fundamental yet challenging tasks in computer vision. The main challenges lie in capturing spatial or temporal dependencies between labels and discovering the locations of discriminative…

Computer Vision and Pattern Recognition · Computer Science 2020-03-30 Renchun You , Zhiyao Guo , Lei Cui , Xiang Long , Yingze Bao , Shilei Wen

Semantic-Enhanced Image Clustering

Image clustering is an important and open-challenging task in computer vision. Although many methods have been proposed to solve the image clustering task, they only explore images and uncover clusters according to the image features, thus…

Computer Vision and Pattern Recognition · Computer Science 2023-04-11 Shaotian Cai , Liping Qiu , Xiaojun Chen , Qin Zhang , Longteng Chen

AspectCSE: Sentence Embeddings for Aspect-based Semantic Textual Similarity Using Contrastive Learning and Structured Knowledge

Generic sentence embeddings provide a coarse-grained approximation of semantic textual similarity but ignore specific aspects that make texts similar. Conversely, aspect-based sentence embeddings provide similarities between texts based on…

Computation and Language · Computer Science 2023-09-26 Tim Schopf , Emanuel Gerber , Malte Ostendorff , Florian Matthes

Probing Multimodal Embeddings for Linguistic Properties: the Visual-Semantic Case

Semantic embeddings have advanced the state of the art for countless natural language processing tasks, and various extensions to multimodal domains, such as visual-semantic embeddings, have been proposed. While the power of visual-semantic…

Machine Learning · Computer Science 2021-02-23 Adam Dahlgren Lindström , Suna Bensch , Johanna Björklund , Frank Drewes

Scaling up Multi-domain Semantic Segmentation with Sentence Embeddings

We propose an approach to semantic segmentation that achieves state-of-the-art supervised performance when applied in a zero-shot setting. It thus achieves results equivalent to those of the supervised methods, on each of the major semantic…

Computer Vision and Pattern Recognition · Computer Science 2024-05-01 Wei Yin , Yifan Liu , Chunhua Shen , Baichuan Sun , Anton van den Hengel

Cooperative Embeddings for Instance, Attribute and Category Retrieval

The goal of this paper is to retrieve an image based on instance, attribute and category similarity notions. Different from existing works, which usually address only one of these entities in isolation, we introduce a cooperative embedding…

Computer Vision and Pattern Recognition · Computer Science 2019-04-03 William Thong , Cees G. M. Snoek , Arnold W. M. Smeulders

Unifying Specialist Image Embedding into Universal Image Embedding

Deep image embedding provides a way to measure the semantic similarity of two images. It plays a central role in many applications such as image search, face verification, and zero-shot learning. It is desirable to have a universal deep…

Computer Vision and Pattern Recognition · Computer Science 2020-03-10 Yang Feng , Futang Peng , Xu Zhang , Wei Zhu , Shanfeng Zhang , Howard Zhou , Zhen Li , Tom Duerig , Shih-Fu Chang , Jiebo Luo

Improving Visual-Semantic Embeddings by Learning Semantically-Enhanced Hard Negatives for Cross-modal Information Retrieval

Visual Semantic Embedding (VSE) aims to extract the semantics of images and their descriptions, and embed them into the same latent space for cross-modal information retrieval. Most existing VSE networks are trained by adopting a hard…

Computer Vision and Pattern Recognition · Computer Science 2023-02-15 Yan Gong , Georgina Cosma

Learning semantic Image attributes using Image recognition and knowledge graph embeddings

Extracting structured knowledge from texts has traditionally been used for knowledge base generation. However, other sources of information, such as images can be leveraged into this process to build more complete and richer knowledge…

Computer Vision and Pattern Recognition · Computer Science 2020-09-15 Ashutosh Tiwari , Sandeep Varma

Towards an Explainable Comparison and Alignment of Feature Embeddings

While several feature embedding models have been developed in the literature, comparisons of these embeddings have largely focused on their numerical performance in classification-related downstream applications. However, an interpretable…

Machine Learning · Computer Science 2025-08-19 Mohammad Jalali , Bahar Dibaei Nia , Farzan Farnia