Related papers: Cross-Modal Knowledge Transfer Without Task-Releva…
Source-free cross-modal knowledge transfer is a crucial yet challenging task, which aims to transfer knowledge from one source modality (e.g., RGB) to the target modality (e.g., depth or infrared) with no access to the task-relevant (TR)…
Neuromorphic spike data, an upcoming modality with high temporal resolution, has shown promising potential in autonomous driving by mitigating the challenges posed by high-velocity motion blur. However, training the spike depth estimation…
This paper strives for action recognition and detection in video modalities like RGB, depth maps or 3D-skeleton sequences when only limited modality-specific labeled examples are available. For the RGB, and derived optical-flow, modality…
Commonsense reasoning systems should be able to generalize to diverse reasoning cases. However, most state-of-the-art approaches depend on expensive data annotations and overfit to a specific benchmark without learning how to perform…
Existing color-guided depth super-resolution (DSR) approaches require paired RGB-D data as training samples where the RGB image is used as structural guidance to recover the degraded depth map due to their geometrical similarity. However,…
In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared…
Multimodal sensing has proven valuable for visual tracking, as different sensor types offer unique strengths in handling one specific challenging scene where object appearance varies. While a generalist model capable of leveraging all…
Many practical applications require training of semantic segmentation models on unlabelled datasets and their execution on low-resource hardware. Distillation from a trained source model may represent a solution for the first but does not…
Cross-modality image segmentation aims to segment the target modalities using a method designed in the source modality. Deep generative models can translate the target modality images into the source modality, thus enabling cross-modality…
Transferring knowledge from a source domain to a target domain can be crucial for whole slide image classification, since the number of samples in a dataset is often limited due to high annotation costs. However, domain shift and task…
Recently, general salient object detection (SOD) has made great progress with the rapid development of deep neural networks. However, task-aware SOD has hardly been studied due to the lack of task-specific datasets. In this paper, we…
Transfer learning is crucial in training deep neural networks on new target tasks. Current transfer learning methods always assume at least one of (i) source and target task label spaces overlap, (ii) source datasets are available, and…
Cross-modal knowledge distillation deals with transferring knowledge from a model trained with superior modalities (Teacher) to another model trained with weak modalities (Student). Existing approaches require paired training examples exist…
With the development of foundation models such as large language models, zero-shot transfer learning has become increasingly significant. This is highlighted by the generative capabilities of NLP models like GPT-4, and the retrieval-based…
Cross-media retrieval is a research hotspot in multimedia area, which aims to perform retrieval across different media types such as image and text. The performance of existing methods usually relies on labeled data for model training.…
Multi-label zero-shot classification aims to predict multiple unseen class labels for an input image. It is more challenging than its single-label counterpart. On one hand, the unconstrained number of labels assigned to each image makes the…
Recently, although pre-trained language models have achieved great success on multilingual NLP (Natural Language Processing) tasks, the lack of training data on many tasks in low-resource languages still limits their performance. One…
In this work, we propose to utilize Convolutional Neural Networks to boost the performance of depth-induced salient object detection by capturing the high-level representative features for depth modality. We formulate the depth-induced…
Although existing cross-domain continual learning approaches successfully address many streaming tasks having domain shifts, they call for a fully labeled source domain hindering their feasibility in the privacy constrained environments.…
In reinforcement learning for visual navigation, it is common to develop a model for each new task, and train that model from scratch with task-specific interactions in 3D environments. However, this process is expensive; massive amounts of…