Related papers: MultiMAE: Multi-modal Multi-task Masked Autoencode…

MultiMAE Meets Earth Observation: Pre-training Multi-modal Multi-task Masked Autoencoders for Earth Observation Tasks

Multi-modal data in Earth Observation (EO) presents a huge opportunity for improving transfer learning capabilities when pre-training deep learning models. Unlike prior work that often overlooks multi-modal EO data, recent methods have…

Computer Vision and Pattern Recognition · Computer Science 2025-05-22 Jose Sosa , Danila Rukhovich , Anis Kacem , Djamila Aouada

Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities

Multimodal magnetic resonance imaging (MRI) constitutes the first line of investigation for clinicians in the care of brain tumors, providing crucial insights for surgery planning, treatment monitoring, and biomarker identification.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-26 Lucas Robinet , Ahmad Berjaoui , Elizabeth Cohen-Jonathan Moyal

MultiMAE for Brain MRIs: Robustness to Missing Inputs Using Multi-Modal Masked Autoencoder

Missing input sequences are common in medical imaging data, posing a challenge for deep learning models reliant on complete input data. In this work, inspired by MultiMAE [2], we develop a masked autoencoder (MAE) paradigm for multi-modal,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-04 Ayhan Can Erdur , Christian Beischl , Daniel Scholz , Jiazhen Pan , Benedikt Wiestler , Daniel Rueckert , Jan C Peeken

A Two-Stage Progressive Pre-training using Multi-Modal Contrastive Masked Autoencoders

In this paper, we propose a new progressive pre-training method for image understanding tasks which leverages RGB-D datasets. The method utilizes Multi-Modal Contrastive Masked Autoencoder and Denoising techniques. Our proposed approach…

Computer Vision and Pattern Recognition · Computer Science 2024-09-17 Muhammad Abdullah Jamal , Omid Mohareri

Multimodal Masked Autoencoders Learn Transferable Representations

Building scalable models to learn from diverse, multimodal data remains an open challenge. For vision-language data, the dominant approaches are based on contrastive learning objectives that train a separate encoder for each modality. While…

Computer Vision and Pattern Recognition · Computer Science 2022-10-24 Xinyang Geng , Hao Liu , Lisa Lee , Dale Schuurmans , Sergey Levine , Pieter Abbeel

Improving Masked Autoencoders by Learning Where to Mask

Masked image modeling is a promising self-supervised learning method for visual data. It is typically built upon image patches with random masks, which largely ignores the variation of information density between them. The question is: Is…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Haijian Chen , Wendong Zhang , Yunbo Wang , Xiaokang Yang

ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework, offering remarkable performance across a wide range of downstream tasks. To increase the difficulty of the pretext task and learn richer visual representations,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Carlos Hinojosa , Shuming Liu , Bernard Ghanem

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Peng Gao , Renrui Zhang , Rongyao Fang , Ziyi Lin , Hongyang Li , Hongsheng Li , Qiao Yu

CL-MAE: Curriculum-Learned Masked Autoencoders

Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Neelu Madan , Nicolae-Catalin Ristea , Kamal Nasrollahi , Thomas B. Moeslund , Radu Tudor Ionescu

Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Han Guo , Ramtin Hosseini , Ruiyi Zhang , Sai Ashish Somayajula , Ranak Roy Chowdhury , Rajesh K. Gupta , Pengtao Xie

PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection

Masked Autoencoders learn strong visual representations and achieve state-of-the-art results in several independent modalities, yet very few works have addressed their capabilities in multi-modality settings. In this work, we focus on point…

Computer Vision and Pattern Recognition · Computer Science 2023-03-15 Anthony Chen , Kevin Zhang , Renrui Zhang , Zihan Wang , Yuheng Lu , Yandong Guo , Shanghang Zhang

SatMAE: Pre-training Transformers for Temporal and Multi-Spectral Satellite Imagery

Unsupervised pre-training methods for large vision models have shown to enhance performance on downstream supervised tasks. Developing similar techniques for satellite imagery presents significant opportunities as unlabelled data is…

Computer Vision and Pattern Recognition · Computer Science 2023-01-18 Yezhen Cong , Samar Khanna , Chenlin Meng , Patrick Liu , Erik Rozi , Yutong He , Marshall Burke , David B. Lobell , Stefano Ermon

Probabilistic Hyper-Graphs using Multiple Randomly Masked Autoencoders for Semi-supervised Multi-modal Multi-task Learning

The computer vision domain has greatly benefited from an abundance of data across many modalities to improve on various visual tasks. Recently, there has been a lot of focus on self-supervised pre-training methods through Masked…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Pîrvu Mihai-Cristian , Marius Leordeanu

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Zhili Liu , Kai Chen , Jianhua Han , Lanqing Hong , Hang Xu , Zhenguo Li , James T. Kwok

Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis

Cross-modality magnetic resonance (MR) image synthesis can be used to generate missing modalities from given ones. Existing (supervised learning) methods often require a large number of paired multi-modal data to train an effective…

Image and Video Processing · Electrical Eng. & Systems 2023-06-21 Yonghao Li , Tao Zhou , Kelei He , Yi Zhou , Dinggang Shen

MTSMAE: Masked Autoencoders for Multivariate Time-Series Forecasting

Large-scale self-supervised pre-training Transformer architecture have significantly boosted the performance for various tasks in natural language processing (NLP) and computer vision (CV). However, there is a lack of researches on…

Machine Learning · Computer Science 2022-10-06 Peiwang Tang , Xianchao Zhang

CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets

Current RGB-D scene recognition approaches often train two standalone backbones for RGB and depth modalities with the same Places or ImageNet pre-training. However, the pre-trained depth network is still biased by RGB-based models which may…

Computer Vision and Pattern Recognition · Computer Science 2023-02-14 Jiange Yang , Sheng Guo , Gangshan Wu , Limin Wang

DenoMAE: A Multimodal Autoencoder for Denoising Modulation Signals

We propose Denoising Masked Autoencoder (Deno-MAE), a novel multimodal autoencoder framework for denoising modulation signals during pretraining. DenoMAE extends the concept of masked autoencoders by incorporating multiple input modalities,…

Machine Learning · Computer Science 2025-01-22 Atik Faysal , Taha Boushine , Mohammad Rostami , Reihaneh Gh. Roshan , Huaxia Wang , Nikhil Muralidhar , Avimanyu Sahoo , Yu-Dong Yao

Traj-MAE: Masked Autoencoders for Trajectory Prediction

Trajectory prediction has been a crucial task in building a reliable autonomous driving system by anticipating possible dangers. One key issue is to generate consistent trajectory predictions without colliding. To overcome the challenge, we…

Computer Vision and Pattern Recognition · Computer Science 2023-03-14 Hao Chen , Jiaze Wang , Kun Shao , Furui Liu , Jianye Hao , Chenyong Guan , Guangyong Chen , Pheng-Ann Heng

Understanding and Enhancing Mask-Based Pretraining towards Universal Representations

Mask-based pretraining has become a cornerstone of modern large-scale models across language, vision, and recently biology. Despite its empirical success, its role and limits in learning data representations have been unclear. In this work,…

Machine Learning · Computer Science 2025-09-29 Mingze Dong , Leda Wang , Yuval Kluger