English
Related papers

Related papers: Multi-Grained Compositional Visual Clue Learning f…

200 papers

Textural Inversion, a prompt learning method, learns a singular text embedding for a new "word" to represent image style and appearance, allowing it to be integrated into natural language sentences to generate novel synthesised images.…

Computer Vision and Pattern Recognition · Computer Science 2024-05-28 Chen Jin , Ryutaro Tanno , Amrutha Saseendran , Tom Diethe , Philip Teare

Contrastive Learning (CL)-based recommender systems have gained prominence in the context of Heterogeneous Graph (HG) due to their capacity to enhance the consistency of representations across different views. However, existing frameworks…

Information Retrieval · Computer Science 2024-07-30 Lei Sang , Yu Wang , Yi Zhang , Yiwen Zhang , Xindong Wu

Contrastive Language Image Pre-training (CLIP) has recently demonstrated success across various tasks due to superior feature representation empowered by image-text contrastive learning. However, the instance discrimination method used by…

Computer Vision and Pattern Recognition · Computer Science 2024-11-07 Xiang An , Kaicheng Yang , Xiangzi Dai , Ziyong Feng , Jiankang Deng

Multiview clustering (MVC) aims to reveal the underlying structure of multiview data by categorizing data samples into clusters. Deep learning-based methods exhibit strong feature learning capabilities on large-scale datasets. For most…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Jie Chen , Hua Mao , Wai Lok Woo , Xi Peng

In Large Visual Language Models (LVLMs), the efficacy of In-Context Learning (ICL) remains limited by challenges in cross-modal interactions and representation disparities. To overcome these challenges, we introduce a novel Visual…

Computer Vision and Pattern Recognition · Computer Science 2024-02-20 Yucheng Zhou , Xiang Li , Qianning Wang , Jianbing Shen

Cross-modal retrieval has become a highlighted research topic for retrieval across multimedia data such as image and text. A two-stage learning framework is widely adopted by most existing methods based on Deep Neural Network (DNN): The…

Multimedia · Computer Science 2017-08-09 Yuxin Peng , Jinwei Qi , Xin Huang , Yuxin Yuan

Accurately modeling users' evolving preferences from sequential interactions remains a central challenge in recommender systems. Recent studies emphasize the importance of capturing multiple latent intents underlying user behaviors.…

Information Retrieval · Computer Science 2026-04-21 Shanfan Zhang , Yongyi Lin , Yuan Rao

Fine-grained image classification is a challenging task due to the large intra-class variance and small inter-class variance, aiming at recognizing hundreds of sub-categories belonging to the same basic-level category. Most existing…

Computer Vision and Pattern Recognition · Computer Science 2017-11-29 Xiangteng He , Yuxin Peng

Composed Image Retrieval (CIR) aims to retrieve target images based on a reference image and modified texts. However, existing methods often struggle to extract the correct semantic cues from the reference image that best reflect the user's…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Xuri Ge , Chunhao Wang , Xindi Wang , Zheyun Qin , Zhumin Chen , Xin Xin

Multi-view learning (MVL) has gained great success in integrating information from multiple perspectives of a dataset to improve downstream task performance. To make MVL methods more practical in an open-ended environment, this paper…

Machine Learning · Computer Science 2023-10-16 Depeng Li , Tianqi Wang , Junwei Chen , Kenji Kawaguchi , Cheng Lian , Zhigang Zeng

One of the well-known challenges in computer vision tasks is the visual diversity of images, which could result in an agreement or disagreement between the learned knowledge and the visual content exhibited by the current observation. In…

Machine Learning · Computer Science 2020-01-03 Yan Luo , Yongkang Wong , Mohan S. Kankanhalli , Qi Zhao

With the continuous emergence of various social media platforms frequently used in daily life, the multimodal meme understanding (MMU) task has been garnering increasing attention. MMU aims to explore and comprehend the meanings of memes…

Computation and Language · Computer Science 2025-03-18 Li Zheng , Hao Fei , Ting Dai , Zuquan Peng , Fei Li , Huisheng Ma , Chong Teng , Donghong Ji

Continual learning is essential for medical image classification systems to adapt to dynamically evolving clinical environments. The integration of multimodal information can significantly enhance continual learning of image classes.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Jiantao Tan , Peixian Ma , Kanghao Chen , Zhiming Dai , Ruixuan Wang

Current research on class-incremental learning primarily focuses on single-label classification tasks. However, real-world applications often involve multi-label scenarios, such as image retrieval and medical imaging. Therefore, this paper…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Chenhao Ding , Songlin Dong , Zhengdong Zhou , Jizhou Han , Qiang Wang , Yuhang He , Yihong Gong

This paper proposes a user semantic intent modeling algorithm based on Capsule Networks to address the problem of insufficient accuracy in intent recognition for human-computer interaction. The method represents semantic features in input…

Computation and Language · Computer Science 2025-07-02 Shixiao Wang , Yifan Zhuang , Runsheng Zhang , Zhijun Song

Large-scale pre-trained Vision-Language Models (VLMs), such as CLIP, establish the correlation between texts and images, achieving remarkable success on various downstream tasks with fine-tuning. In existing fine-tuning methods, the…

Computer Vision and Pattern Recognition · Computer Science 2023-07-31 Yi Zhang , Ce Zhang , Yushun Tang , Zhihai He

Multimodal intent recognition aims to leverage diverse modalities such as expressions, body movements and tone of speech to comprehend user's intent, constituting a critical task for understanding human language and behavior in real-world…

Multimedia · Computer Science 2024-06-07 Qianrui Zhou , Hua Xu , Hao Li , Hanlei Zhang , Xiaohan Zhang , Yifan Wang , Kai Gao

We present a collaborative learning method called Mutual Contrastive Learning (MCL) for general visual representation learning. The core idea of MCL is to perform mutual interaction and transfer of contrastive distributions among a cohort…

Computer Vision and Pattern Recognition · Computer Science 2022-01-19 Chuanguang Yang , Zhulin An , Linhang Cai , Yongjun Xu

Multi-label image classification presents a challenging task in many domains, including computer vision and medical imaging. Recent advancements have introduced graph-based and transformer-based methods to improve performance and capture…

Computer Vision and Pattern Recognition · Computer Science 2024-04-15 Ahmad Sajedi , Samir Khaki , Yuri A. Lawryshyn , Konstantinos N. Plataniotis

Conditional inference on joint textual and visual clues is a multi-modal reasoning task that textual clues provide prior permutation or external knowledge, which are complementary with visual content and pivotal to deducing the correct…

Computation and Language · Computer Science 2023-05-09 Yunxin Li , Baotian Hu , Xinyu Chen , Yuxin Ding , Lin Ma , Min Zhang
‹ Prev 1 2 3 10 Next ›