English
Related papers

Related papers: Improving Visual-Semantic Embedding with Adaptive …

200 papers

Visual Semantic Embedding (VSE) is a dominant approach for vision-language retrieval, which aims at learning a deep embedding space such that visual data are embedded close to their semantic text labels or descriptions. Recent VSE models…

Computer Vision and Pattern Recognition · Computer Science 2021-07-07 Jiacheng Chen , Hexiang Hu , Hao Wu , Yuning Jiang , Changhu Wang

Jointing visual-semantic embeddings (VSE) have become a research hotpot for the task of image annotation, which suffers from the issue of semantic gap, i.e., the gap between images' visual features (low-level) and labels' semantic features…

Computer Vision and Pattern Recognition · Computer Science 2018-08-14 Guibing Guo , Songlin Zhai , Fajie Yuan , Yuan Liu , Xingwei Wang

Visual Semantic Embedding (VSE) aims to extract the semantics of images and their descriptions, and embed them into the same latent space for cross-modal information retrieval. Most existing VSE networks are trained by adopting a hard…

Computer Vision and Pattern Recognition · Computer Science 2023-02-15 Yan Gong , Georgina Cosma

Learning visual semantic similarity is a critical challenge in bridging the gap between images and texts. However, there exist inherent variations between vision and language data, such as information density, i.e., images can contain…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Yang Liu , Mengyuan Liu , Shudong Huang , Jiancheng Lv

We present a new technique for learning visual-semantic embeddings for cross-modal retrieval. Inspired by hard negative mining, the use of hard negatives in structured prediction, and ranking loss functions, we introduce a simple change to…

Machine Learning · Computer Science 2018-07-31 Fartash Faghri , David J. Fleet , Jamie Ryan Kiros , Sanja Fidler

The core of cross-modal matching is to accurately measure the similarity between different modalities in a unified representation space. However, compared to textual descriptions of a certain perspective, the visual modality has more…

Computer Vision and Pattern Recognition · Computer Science 2023-12-22 Wenzhang Wei , Zhipeng Gui , Changguang Wu , Anqi Zhao , Dehua Peng , Huayi Wu

Visual-Semantic Embedding (VSE) is a prevalent approach in image-text retrieval by learning a joint embedding space between the image and language modalities where semantic similarities would be preserved. The triplet loss with…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Hong Xuan , Xi Chen

Visual-semantic embedding aims to learn a joint embedding space where related video and sentence instances are located close to each other. Most existing methods put instances in a single embedding space. However, they struggle to embed…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Huy Manh Nguyen , Tomo Miyazaki , Yoshihiro Sugaya , Shinichiro Omachi

Enabling Visual Semantic Models to effectively handle multi-view description matching has been a longstanding challenge. Existing methods typically learn a set of embeddings to find the optimal match for each view's text and compute…

Computer Vision and Pattern Recognition · Computer Science 2025-07-18 Yang Liu , Wentao Feng , Zhuoyao Liu , Shudong Huang , Jiancheng Lv

Visual Semantic Embedding (VSE) models, which map images into a rich semantic embedding space, have been a milestone in object recognition and zero-shot learning. Current approaches to VSE heavily rely on static word em-bedding techniques.…

Computer Vision and Pattern Recognition · Computer Science 2021-07-27 Yue Jiao , Jonathon Hare , Adam Prügel-Bennett

We study the problem of grounding distributional representations of texts on the visual domain, namely visual-semantic embeddings (VSE for short). Begin with an insightful adversarial attack on VSE embeddings, we show the limitation of…

Computation and Language · Computer Science 2018-06-28 Haoyue Shi , Jiayuan Mao , Tete Xiao , Yuning Jiang , Jian Sun

Weakly-Supervised Semantic Segmentation (WSSS) methods with image-level labels generally train a classification network to generate the Class Activation Maps (CAMs) as the initial coarse segmentation labels. However, current WSSS methods…

Computer Vision and Pattern Recognition · Computer Science 2022-02-11 Lixiang Ru , Bo Du , Yibing Zhan , Chen Wu

Learning visual similarity requires to learn relations, typically between triplets of images. Albeit triplet approaches being powerful, their computational complexity mostly limits training to only a subset of all possible training…

Computer Vision and Pattern Recognition · Computer Science 2020-03-31 Karsten Roth , Timo Milbich , Björn Ommer

We propose Unified Visual-Semantic Embeddings (UniVSE) for learning a joint space of visual and textual concepts. The space unifies the concepts at different levels, including objects, attributes, relations, and full scenes. A contrastive…

Computer Vision and Pattern Recognition · Computer Science 2019-04-30 Hao Wu , Jiayuan Mao , Yufeng Zhang , Yuning Jiang , Lei Li , Weiwei Sun , Wei-Ying Ma

Downsampling is widely adopted to achieve a good trade-off between accuracy and latency for visual recognition. Unfortunately, the commonly used pooling layers are not learned, and thus cannot preserve important information. As another…

Computer Vision and Pattern Recognition · Computer Science 2022-07-26 Ho Man Kwan , Shenghui Song

For convolutional neural network models that optimize an image embedding, we propose a method to highlight the regions of images that contribute most to pairwise similarity. This work is a corollary to the visualization tools developed for…

Computer Vision and Pattern Recognition · Computer Science 2019-01-04 Abby Stylianou , Richard Souvenir , Robert Pless

Learning invariant representations from images is one of the hardest challenges facing computer vision. Spatial pooling is widely used to create invariance to spatial shifting, but it is restricted to convolutional models. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2013-03-19 Sainbayar Sukhbaatar , Takaki Makino , Kazuyuki Aihara

With the rapid development of multimodal learning, the image-text matching task, as a bridge connecting vision and language, has become increasingly important. Based on existing research, this study proposes an innovative visual semantic…

Computer Vision and Pattern Recognition · Computer Science 2024-12-30 Wenjing Chen

Human perception of visual similarity is inherently adaptive and subjective, depending on the users' interests and focus. However, most image retrieval systems fail to reflect this flexibility, relying on a fixed, monolithic metric that…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Sohwi Lim , Lee Hyoseok , Jungjoon Park , Tae-Hyun Oh

Visual-semantic embedding aims to find a shared latent space where related visual and textual instances are close to each other. Most current methods learn injective embedding functions that map an instance to a single point in the shared…

Computer Vision and Pattern Recognition · Computer Science 2019-07-18 Yale Song , Mohammad Soleymani
‹ Prev 1 2 3 10 Next ›