English
Related papers

Related papers: MultiMAE: Multi-modal Multi-task Masked Autoencode…

200 papers

Multi-modal data in Earth Observation (EO) presents a huge opportunity for improving transfer learning capabilities when pre-training deep learning models. Unlike prior work that often overlooks multi-modal EO data, recent methods have…

Computer Vision and Pattern Recognition · Computer Science 2025-05-22 Jose Sosa , Danila Rukhovich , Anis Kacem , Djamila Aouada

Multimodal magnetic resonance imaging (MRI) constitutes the first line of investigation for clinicians in the care of brain tumors, providing crucial insights for surgery planning, treatment monitoring, and biomarker identification.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Lucas Robinet , Ahmad Berjaoui , Elizabeth Cohen-Jonathan Moyal

Missing input sequences are common in medical imaging data, posing a challenge for deep learning models reliant on complete input data. In this work, inspired by MultiMAE [2], we develop a masked autoencoder (MAE) paradigm for multi-modal,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-04 Ayhan Can Erdur , Christian Beischl , Daniel Scholz , Jiazhen Pan , Benedikt Wiestler , Daniel Rueckert , Jan C Peeken

In this paper, we propose a new progressive pre-training method for image understanding tasks which leverages RGB-D datasets. The method utilizes Multi-Modal Contrastive Masked Autoencoder and Denoising techniques. Our proposed approach…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Muhammad Abdullah Jamal , Omid Mohareri

Building scalable models to learn from diverse, multimodal data remains an open challenge. For vision-language data, the dominant approaches are based on contrastive learning objectives that train a separate encoder for each modality. While…

Computer Vision and Pattern Recognition · Computer Science 2022-10-24 Xinyang Geng , Hao Liu , Lisa Lee , Dale Schuurmans , Sergey Levine , Pieter Abbeel

Masked image modeling is a promising self-supervised learning method for visual data. It is typically built upon image patches with random masks, which largely ignores the variation of information density between them. The question is: Is…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Haijian Chen , Wendong Zhang , Yunbo Wang , Xiaokang Yang

Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework, offering remarkable performance across a wide range of downstream tasks. To increase the difficulty of the pretext task and learn richer visual representations,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Carlos Hinojosa , Shuming Liu , Bernard Ghanem

Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Peng Gao , Renrui Zhang , Rongyao Fang , Ziyi Lin , Hongyang Li , Hongsheng Li , Qiao Yu

Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Neelu Madan , Nicolae-Catalin Ristea , Kamal Nasrollahi , Thomas B. Moeslund , Radu Tudor Ionescu

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Han Guo , Ramtin Hosseini , Ruiyi Zhang , Sai Ashish Somayajula , Ranak Roy Chowdhury , Rajesh K. Gupta , Pengtao Xie

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings. In this work, we focus on point…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Anthony Chen , Kevin Zhang , Renrui Zhang , Zihan Wang , Yuheng Lu , Yandong Guo , Shanghang Zhang

Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is…

Computer Vision and Pattern Recognition · Computer Science 2023-01-18 Yezhen Cong , Samar Khanna , Chenlin Meng , Patrick Liu , Erik Rozi , Yutong He , Marshall Burke , David B. Lobell , Stefano Ermon

The computer vision domain has greatly benefited from an abundance of data across many modalities to improve on various visual tasks. Recently, there has been a lot of focus on self-supervised pre-training methods through Masked…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Pîrvu Mihai-Cristian , Marius Leordeanu

Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Zhili Liu , Kai Chen , Jianhua Han , Lanqing Hong , Hang Xu , Zhenguo Li , James T. Kwok

Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective…

Image and Video Processing · Electrical Eng. & Systems 2023-06-21 Yonghao Li , Tao Zhou , Kelei He , Yi Zhou , Dinggang Shen

Large-scale self-supervised pre-training Transformer architecture have significantly boosted the performance for various tasks in natural language processing (NLP) and computer vision (CV). However, there is a lack of researches on…

Machine Learning · Computer Science 2022-10-06 Peiwang Tang , Xianchao Zhang

Current RGB-D scene recognition approaches often train two standalone backbones for RGB and depth modalities with the same Places or ImageNet pre-training. However, the pre-trained depth network is still biased by RGB-based models which may…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Jiange Yang , Sheng Guo , Gangshan Wu , Limin Wang

We propose Denoising Masked Autoencoder (Deno-MAE), a novel multimodal autoencoder framework for denoising modulation signals during pretraining. DenoMAE extends the concept of masked autoencoders by incorporating multiple input modalities,…

Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers. One key issue is to generate consistent trajectory predictions without colliding. To overcome the challenge, we…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Hao Chen , Jiaze Wang , Kun Shao , Furui Liu , Jianye Hao , Chenyong Guan , Guangyong Chen , Pheng-Ann Heng

Mask-based pretraining has become a cornerstone of modern large-scale models across language, vision, and recently biology. Despite its empirical success, its role and limits in learning data representations have been unclear. In this work,…

Machine Learning · Computer Science 2025-09-29 Mingze Dong , Leda Wang , Yuval Kluger
‹ Prev 1 2 3 10 Next ›