English
Related papers

Related papers: Learning Modality Knowledge Alignment for Cross-Mo…

200 papers

Continual learning is essential for adapting models to new tasks while retaining previously acquired knowledge. While existing approaches predominantly focus on uni-modal data, multi-modal learning offers substantial benefits by utilizing…

Machine Learning · Computer Science 2025-11-11 Evelyn Chee , Wynne Hsu , Mong Li Lee

Understanding what and how neural networks memorize during training is crucial, both from the perspective of unintentional memorization of potentially sensitive information and from the standpoint of effective knowledge acquisition for…

Computer Vision and Pattern Recognition · Computer Science 2025-06-06 Yuxin Wen , Yangsibo Huang , Tom Goldstein , Ravi Kumar , Badih Ghazi , Chiyuan Zhang

How to achieve better end-to-end speech translation (ST) by leveraging (text) machine translation (MT) data? Among various existing techniques, multi-task learning is one of the effective ways to share knowledge between ST and MT in which…

Computation and Language · Computer Science 2023-05-16 Qingkai Fang , Yang Feng

The natural world is abundant with concepts expressed via visual, acoustic, tactile, and linguistic modalities. Much of the existing progress in multimodal learning, however, focuses primarily on problems where the same set of modalities…

Machine Learning · Computer Science 2020-12-08 Paul Pu Liang , Peter Wu , Liu Ziyin , Louis-Philippe Morency , Ruslan Salakhutdinov

Deep learning achieved great progress recently, however, it is not easy or efficient to further improve its performance by increasing the size of the model. Multi-modal learning can mitigate this challenge by introducing richer and more…

Artificial Intelligence · Computer Science 2025-10-07 Cairong Zhao , Yufeng Jin , Zifan Song , Haonan Chen , Duoqian Miao , Guosheng Hu

Multi-modal affect recognition models leverage complementary information in different modalities to outperform their uni-modal counterparts. However, due to the unavailability of modality-specific sensors or data, multi-modal models may not…

Image and Video Processing · Electrical Eng. & Systems 2021-08-03 Vandana Rajan , Alessio Brutti , Andrea Cavallaro

People can recognize scenes across many different modalities beyond natural images. In this paper, we investigate how to learn cross-modal scene representations that transfer across modalities. To study this problem, we introduce a new…

Computer Vision and Pattern Recognition · Computer Science 2016-07-26 Lluis Castrejon , Yusuf Aytar , Carl Vondrick , Hamed Pirsiavash , Antonio Torralba

Continual learning aims to learn knowledge of tasks observed in sequential time steps while mitigating the forgetting of previously learned knowledge. Existing methods were designed to learn a single modality (e.g., image) over time, which…

Computer Vision and Pattern Recognition · Computer Science 2025-08-15 Hyundong Jin , Eunwoo Kim

Cross-modality recognition has many important applications in science, law enforcement and entertainment. Popular methods to bridge the modality gap include reducing the distributional differences of representations of different modalities,…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Xin Niu , Enyi Li , Jinchao Liu , Yan Wang , Margarita Osadchy , Yongchun Fang

Pre-trained vision language models have shown remarkable performance on visual recognition tasks, but they typically assume the availability of complete multimodal inputs during both training and inference. In real-world scenarios, however,…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Shu Zhao , Nilesh Ahuja , Tan Yu , Tianyi Shen , Vijaykrishnan Narayanan

Pre-training and fine-tuning is a paradigm for alleviating the data scarcity problem in end-to-end speech translation (E2E ST). The commonplace "modality gap" between speech and text data often leads to inconsistent inputs between…

Computation and Language · Computer Science 2023-06-14 Yuchen Han , Chen Xu , Tong Xiao , Jingbo Zhu

Learning effective recommendation models from sparse user interactions represents a fundamental challenge in developing sequential recommendation methods. Recently, pre-training-based methods have been developed to tackle this challenge.…

Information Retrieval · Computer Science 2023-09-21 Bo Peng , Srinivasan Parthasarathy , Xia Ning

Multimodal meta-learning is a recent problem that extends conventional few-shot meta-learning by generalizing its setup to diverse multimodal task distributions. This setup makes a step towards mimicking how humans make use of a diverse set…

Machine Learning · Computer Science 2021-10-28 Milad Abdollahzadeh , Touba Malekzadeh , Ngai-Man Cheung

Adapting pre-trained models to unseen feature modalities has become increasingly important due to the growing need for cross-disciplinary knowledge integration. A key challenge here is how to align the representation of new modalities with…

Machine Learning · Computer Science 2026-04-21 Trong Khiem Tran , Manh Cuong Dao , Phi Le Nguyen , Thao Nguyen Truong , Trong Nghia Hoang

Multilingual machine translation systems aim to make knowledge accessible across languages, yet learning effective cross-lingual representations remains challenging. These challenges are especially pronounced for low-resource languages,…

Computation and Language · Computer Science 2026-01-08 David Stap

Meta-learning, or learning to learn, is a technique that can help to overcome resource scarcity in cross-lingual NLP problems, by enabling fast adaptation to new tasks. We apply model-agnostic meta-learning (MAML) to the task of…

Computation and Language · Computer Science 2022-03-24 Anna Langedijk , Verna Dankers , Phillip Lippe , Sander Bos , Bryan Cardenas Guevara , Helen Yannakoudakis , Ekaterina Shutova

Cross-modal knowledge distillation deals with transferring knowledge from a model trained with superior modalities (Teacher) to another model trained with weak modalities (Student). Existing approaches require paired training examples exist…

Computer Vision and Pattern Recognition · Computer Science 2020-04-02 Long Zhao , Xi Peng , Yuxiao Chen , Mubbasir Kapadia , Dimitris N. Metaxas

Pre-trained language models are still far from human performance in tasks that need understanding of properties (e.g. appearance, measurable quantity) and affordances of everyday objects in the real world since the text lacks such…

Computation and Language · Computer Science 2022-03-18 Woojeong Jin , Dong-Ho Lee , Chenguang Zhu , Jay Pujara , Xiang Ren

As the application of deep learning has expanded to real-world problems with insufficient volume of training data, transfer learning recently has gained much attention as means of improving the performance in such small-data regime.…

Machine Learning · Computer Science 2019-05-16 Yunhun Jang , Hankook Lee , Sung Ju Hwang , Jinwoo Shin

In this study, we focus on heterogeneous knowledge transfer across entirely different model architectures, tasks, and modalities. Existing knowledge transfer methods (e.g., backbone sharing, knowledge distillation) often hinge on shared…

Machine Learning · Computer Science 2024-12-30 Kunxi Li , Tianyu Zhan , Kairui Fu , Shengyu Zhang , Kun Kuang , Jiwei Li , Zhou Zhao , Fan Wu , Fei Wu
‹ Prev 1 2 3 10 Next ›