Related papers: Mixed Autoencoder for Self-supervised Visual Repre…

Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Han Guo , Ramtin Hosseini , Ruiyi Zhang , Sai Ashish Somayajula , Ranak Roy Chowdhury , Rajesh K. Gupta , Pengtao Xie

Efficient Masked Autoencoders with Self-Consistency

Inspired by the masked language modeling (MLM) in natural language processing tasks, the masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision. However, the high random mask ratio…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Zhaowen Li , Yousong Zhu , Zhiyang Chen , Wei Li , Chaoyang Zhao , Rui Zhao , Ming Tang , Jinqiao Wang

Improving Visual Representation Learning through Perceptual Understanding

We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Samyakh Tukra , Frederick Hoffman , Ken Chatfield

Self-Guided Masked Autoencoder

Masked Autoencoder (MAE) is a self-supervised approach for representation learning, widely applicable to a variety of downstream tasks in computer vision. In spite of its success, it is still not fully uncovered what and how MAE exactly…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Jeongwoo Shin , Inseo Lee , Junho Lee , Joonseok Lee

SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners

Recently, self-supervised Masked Autoencoders (MAE) have attracted unprecedented attention for their impressive representation learning ability. However, the pretext task, Masked Image Modeling (MIM), reconstructs the missing local patches,…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Feng Liang , Yangguang Li , Diana Marculescu

Masked Autoencoders Are Scalable Vision Learners

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Kaiming He , Xinlei Chen , Saining Xie , Yanghao Li , Piotr Dollár , Ross Girshick

The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning

Masked autoencoders (MAE) have recently succeeded in self-supervised vision representation learning. Previous work mainly applied custom-designed (e.g., random, block-wise) masking or teacher (e.g., CLIP)-guided masking and targets.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Shentong Mo

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Peng Gao , Renrui Zhang , Rongyao Fang , Ziyi Lin , Hongyang Li , Hongsheng Li , Qiao Yu

Masked Autoencoders As Spatiotemporal Learners

This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to spatiotemporal representation learning from videos. We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels.…

Computer Vision and Pattern Recognition · Computer Science 2022-10-24 Christoph Feichtenhofer , Haoqi Fan , Yanghao Li , Kaiming He

Context Autoencoder for Self-Supervised Representation Learning

We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 Xiaokang Chen , Mingyu Ding , Xiaodi Wang , Ying Xin , Shentong Mo , Yunhao Wang , Shumin Han , Ping Luo , Gang Zeng , Jingdong Wang

i-MAE: Are Latent Representations in Masked Autoencoders Linearly Separable?

Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training approach in the vision domain. However, the mechanism and properties of the learned representations by such a scheme, as well as how to further enhance…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Kevin Zhang , Zhiqiang Shen

Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders

Masked autoencoders (MAEs) have emerged recently as art self-supervised spatiotemporal representation learners. Inheriting from the image counterparts, however, existing video MAEs still focus largely on static appearance learning whilst…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Haosen Yang , Deng Huang , Bin Wen , Jiannan Wu , Hongxun Yao , Yi Jiang , Xiatian Zhu , Zehuan Yuan

CoMA: Complementary Masking and Hierarchical Dynamic Multi-Window Self-Attention in a Unified Pre-training Framework

Masked Autoencoders (MAE) achieve self-supervised learning of image representations by randomly removing a portion of visual tokens and reconstructing the original image as a pretext task, thereby significantly enhancing pretraining…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Jiaxuan Li , Qing Xu , Xiangjian He , Ziyu Liu , Chang Xing , Zhen Chen , Daokun Zhang , Rong Qu , Chang Wen Chen

Task-customized Masked AutoEncoder via Mixture of Cluster-conditional Experts

Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Zhili Liu , Kai Chen , Jianhua Han , Lanqing Hong , Hang Xu , Zhenguo Li , James T. Kwok

An information theoretic approach to the autoencoder

We present a variation of the Autoencoder (AE) that explicitly maximizes the mutual information between the input data and the hidden representation. The proposed model, the InfoMax Autoencoder (IMAE), by construction is able to learn a…

Machine Learning · Computer Science 2019-01-24 Vincenzo Crescimanna , Bruce Graham

Contrastive Masked Autoencoders are Stronger Vision Learners

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger vision learner. Towards this…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Zhicheng Huang , Xiaojie Jin , Chengze Lu , Qibin Hou , Ming-Ming Cheng , Dongmei Fu , Xiaohui Shen , Jiashi Feng

ViC-MAE: Self-Supervised Representation Learning from Images and Video with Contrastive Masked Autoencoders

We propose ViC-MAE, a model that combines both Masked AutoEncoders (MAE) and contrastive learning. ViC-MAE is trained using a global featured obtained by pooling the local representations learned under an MAE reconstruction loss and…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Jefferson Hernandez , Ruben Villegas , Vicente Ordonez

MixMAE: Mixed and Masked Autoencoder for Efficient Pretraining of Hierarchical Vision Transformers

In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers. Existing masked image modeling (MIM) methods for hierarchical Vision…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Jihao Liu , Xin Huang , Jinliang Zheng , Yu Liu , Hongsheng Li

Pseudo Labelling for Enhanced Masked Autoencoders

Masked Image Modeling (MIM)-based models, such as SdAE, CAE, GreenMIM, and MixAE, have explored different strategies to enhance the performance of Masked Autoencoders (MAE) by modifying prediction, loss functions, or incorporating…

Computer Vision and Pattern Recognition · Computer Science 2024-06-26 Srinivasa Rao Nandam , Sara Atito , Zhenhua Feng , Josef Kittler , Muhammad Awais

Masked Autoencoder Pretraining on Strong-Lensing Images for Joint Dark-Matter Model Classification and Super-Resolution

Strong gravitational lensing can reveal the influence of dark-matter substructure in galaxies, but analyzing these effects from noisy, low-resolution images poses a significant challenge. In this work, we propose a masked autoencoder (MAE)…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Achmad Ardani Prasha , Clavino Ourizqi Rachmadi , Muhamad Fauzan Ibnu Syahlan , Naufal Rahfi Anugerah , Nanda Garin Raditya , Putri Amelia , Sabrina Laila Mutiara , Hilman Syachr Ramadhan