Related papers: Multi-Grained Compositional Visual Clue Learning f…

An Image is Worth Multiple Words: Discovering Object Level Concepts using Multi-Concept Prompt Learning

Textural Inversion, a prompt learning method, learns a singular text embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images.…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Chen Jin , Ryutaro Tanno , Amrutha Saseendran , Tom Diethe , Philip Teare

Intent-guided Heterogeneous Graph Contrastive Learning for Recommendation

Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views. However, existing frameworks…

Information Retrieval · Computer Science 2024-07-30 Lei Sang , Yu Wang , Yi Zhang , Yiwen Zhang , Xindong Wu

Multi-label Cluster Discrimination for Visual Representation Learning

Contrastive Language Image Pre-training (CLIP) has recently demonstrated success across various tasks due to superior feature representation empowered by image-text contrastive learning. However, the instance discrimination method used by…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Xiang An , Kaicheng Yang , Xiangzi Dai , Ziyong Feng , Jiankang Deng

Deep Multiview Clustering by Contrasting Cluster Assignments

Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Jie Chen , Hua Mao , Wai Lok Woo , Xi Peng

Visual In-Context Learning for Large Vision-Language Models

In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual…

Computer Vision and Pattern Recognition · Computer Science 2024-02-20 Yucheng Zhou , Xiang Li , Qianning Wang , Jianbing Shen

CCL: Cross-modal Correlation Learning with Multi-grained Fusion by Hierarchical Network

Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on Deep Neural Network (DNN): The…

Multimedia · Computer Science 2017-08-09 Yuxin Peng , Jinwei Qi , Xin Huang , Yuxin Yuan

BIPCL: Bilateral Intent-Enhanced Sequential Recommendation via Embedding Perturbation Contrastive Learning

Accurately modeling users' evolving preferences from sequential interactions remains a central challenge in recommender systems. Recent studies emphasize the importance of capturing multiple latent intents underlying user behaviors.…

Information Retrieval · Computer Science 2026-04-21 Shanfan Zhang , Yongyi Lin , Yuan Rao

Fine-graind Image Classification via Combining Vision and Language

Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing…

Computer Vision and Pattern Recognition · Computer Science 2017-11-29 Xiangteng He , Yuxin Peng

MCoT-MVS: Multi-level Vision Selection by Multi-modal Chain-of-Thought Reasoning for Composed Image Retrieval

Composed Image Retrieval (CIR) aims to retrieve target images based on a reference image and modified texts. However, existing methods often struggle to extract the correct semantic cues from the reference image that best reflect the user's…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Xuri Ge , Chunhao Wang , Xindi Wang , Zheyun Qin , Zhumin Chen , Xin Xin

Multi-View Class Incremental Learning

Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. To make MVL methods more practical in an open-ended environment, this paper…

Machine Learning · Computer Science 2023-10-16 Depeng Li , Tianqi Wang , Junwei Chen , Kenji Kawaguchi , Cheng Lian , Zhigang Zeng

Direction Concentration Learning: Enhancing Congruency in Machine Learning

One of the well-known challenges in computer vision tasks is the visual diversity of images, which could result in an agreement or disagreement between the learned knowledge and the visual content exhibited by the current observation. In…

Machine Learning · Computer Science 2020-01-03 Yan Luo , Yongkang Wong , Mohan S. Kankanhalli , Qi Zhao

Multi-Granular Multimodal Clue Fusion for Meme Understanding

With the continuous emergence of various social media platforms frequently used in daily life, the multimodal meme understanding (MMU) task has been garnering increasing attention. MMU aims to explore and comprehend the meanings of memes…

Computation and Language · Computer Science 2025-03-18 Li Zheng , Hao Fei , Ting Dai , Zuquan Peng , Fei Li , Huisheng Ma , Chong Teng , Donghong Ji

Augmenting Continual Learning of Diseases with LLM-Generated Visual Concepts

Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Jiantao Tan , Peixian Ma , Kanghao Chen , Zhiming Dai , Ruixuan Wang

Class-Independent Increment: An Efficient Approach for Multi-label Class-Incremental Learning

Current research on class-incremental learning primarily focuses on single-label classification tasks. However, real-world applications often involve multi-label scenarios, such as image retrieval and medical imaging. Therefore, this paper…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Chenhao Ding , Songlin Dong , Zhengdong Zhou , Jizhou Han , Qiang Wang , Yuhang He , Yihong Gong

Capsule Network-Based Semantic Intent Modeling for Human-Computer Interaction

This paper proposes a user semantic intent modeling algorithm based on Capsule Networks to address the problem of insufficient accuracy in intent recognition for human-computer interaction. The method represents semantic features in input…

Computation and Language · Computer Science 2025-07-02 Shixiao Wang , Yifan Zhuang , Runsheng Zhang , Zhijun Song

Cross-Modal Concept Learning and Inference for Vision-Language Models

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yi Zhang , Ce Zhang , Yushun Tang , Zhihai He

Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition

Multimodal intent recognition aims to leverage diverse modalities such as expressions, body movements and tone of speech to comprehend user's intent, constituting a critical task for understanding human language and behavior in real-world…

Multimedia · Computer Science 2024-06-07 Qianrui Zhou , Hua Xu , Hao Li , Hanlei Zhang , Xiaohan Zhang , Yifan Wang , Kai Gao

Mutual Contrastive Learning for Visual Representation Learning

We present a collaborative learning method called Mutual Contrastive Learning (MCL) for general visual representation learning. The core idea of MCL is to perform mutual interaction and transfer of contrastive distributions among a cohort…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chuanguang Yang , Zhulin An , Linhang Cai , Yongjun Xu

ProbMCL: Simple Probabilistic Contrastive Learning for Multi-label Visual Classification

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Ahmad Sajedi , Samir Khaki , Yuri A. Lawryshyn , Konstantinos N. Plataniotis

A Multi-Modal Context Reasoning Approach for Conditional Inference on Joint Textual and Visual Clues

Conditional inference on joint textual and visual clues is a multi-modal reasoning task that textual clues provide prior permutation or external knowledge, which are complementary with visual content and pivotal to deducing the correct…

Computation and Language · Computer Science 2023-05-09 Yunxin Li , Baotian Hu , Xinyu Chen , Yuxin Ding , Lin Ma , Min Zhang