Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Xiangyan Qu; Jing Yu; Keke Gai; Jiamin Zhuang; Yuanmin Tang; Gang Xiong; Gaopeng Gou; Qi Wu

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Computer Vision and Pattern Recognition 2024-07-24 v2

Authors: Xiangyan Qu , Jing Yu , Keke Gai , Jiamin Zhuang , Yuanmin Tang , Gang Xiong , Gaopeng Gou , Qi Wu

Abstract

Recent work shows that documents from encyclopedias serve as helpful auxiliary information for zero-shot learning. Existing methods align the entire semantics of a document with corresponding images to transfer knowledge. However, they disregard that semantic information is not equivalent between them, resulting in a suboptimal alignment. In this work, we propose a novel network to extract multi-view semantic concepts from documents and images and align the matching rather than entire concepts. Specifically, we propose a semantic decomposition module to generate multi-view semantic embeddings from visual and textual sides, providing the basic concepts for partial alignment. To alleviate the issue of information redundancy among embeddings, we propose the local-to-semantic variance loss to capture distinct local details and multiple semantic diversity loss to enforce orthogonality among embeddings. Subsequently, two losses are introduced to partially align visual-semantic embedding pairs according to their semantic relevance at the view and word-to-patch levels. Consequently, we consistently outperform state-of-the-art methods under two document sources in three standard benchmarks for document-based zero-shot learning. Qualitatively, we show that our model learns the interpretable partial association.

Keywords

image retrieval zero-shot learning word embeddings

Cite

@article{arxiv.2407.15613,
  title  = {Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning},
  author = {Xiangyan Qu and Jing Yu and Keke Gai and Jiamin Zhuang and Yuanmin Tang and Gang Xiong and Gaopeng Gou and Qi Wu},
  journal= {arXiv preprint arXiv:2407.15613},
  year   = {2024}
}

Comments

Accepted to ACM International Conference on Multimedia (MM) 2024

Visual-Semantic Decomposition and Partial Alignment for Document-based Zero-Shot Learning

Abstract

Keywords

Cite

Comments

Related papers