English
Related papers

Related papers: Mixed Autoencoder for Self-supervised Visual Repre…

200 papers

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Han Guo , Ramtin Hosseini , Ruiyi Zhang , Sai Ashish Somayajula , Ranak Roy Chowdhury , Rajesh K. Gupta , Pengtao Xie

Inspired by the masked language modeling (MLM) in natural language processing tasks, the masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision. However, the high random mask ratio…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Zhaowen Li , Yousong Zhu , Zhiyang Chen , Wei Li , Chaoyang Zhao , Rui Zhao , Ming Tang , Jinqiao Wang

We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Samyakh Tukra , Frederick Hoffman , Ken Chatfield

Masked Autoencoder (MAE) is a self-supervised approach for representation learning, widely applicable to a variety of downstream tasks in computer vision. In spite of its success, it is still not fully uncovered what and how MAE exactly…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Jeongwoo Shin , Inseo Lee , Junho Lee , Joonseok Lee

Recently, self-supervised Masked Autoencoders (MAE) have attracted unprecedented attention for their impressive representation learning ability. However, the pretext task, Masked Image Modeling (MIM), reconstructs the missing local patches,…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Feng Liang , Yangguang Li , Diana Marculescu

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Kaiming He , Xinlei Chen , Saining Xie , Yanghao Li , Piotr Dollár , Ross Girshick

Masked autoencoders (MAE) have recently succeeded in self-supervised vision representation learning. Previous work mainly applied custom-designed (e.g., random, block-wise) masking or teacher (e.g., CLIP)-guided masking and targets.…

Computer Vision and Pattern Recognition · Computer Science 2024-12-24 Shentong Mo

Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Peng Gao , Renrui Zhang , Rongyao Fang , Ziyi Lin , Hongyang Li , Hongsheng Li , Qiao Yu

This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to spatiotemporal representation learning from videos. We randomly mask out spacetime patches in videos and learn an autoencoder to reconstruct them in pixels.…

Computer Vision and Pattern Recognition · Computer Science 2022-10-24 Christoph Feichtenhofer , Haoqi Fan , Yanghao Li , Kaiming He

We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks…

Computer Vision and Pattern Recognition · Computer Science 2023-08-11 Xiaokang Chen , Mingyu Ding , Xiaodi Wang , Ying Xin , Shentong Mo , Yunhao Wang , Shumin Han , Ping Luo , Gang Zeng , Jingdong Wang

Masked image modeling (MIM) has been recognized as a strong self-supervised pre-training approach in the vision domain. However, the mechanism and properties of the learned representations by such a scheme, as well as how to further enhance…

Computer Vision and Pattern Recognition · Computer Science 2024-04-10 Kevin Zhang , Zhiqiang Shen

Masked autoencoders (MAEs) have emerged recently as art self-supervised spatiotemporal representation learners. Inheriting from the image counterparts, however, existing video MAEs still focus largely on static appearance learning whilst…

Computer Vision and Pattern Recognition · Computer Science 2022-10-11 Haosen Yang , Deng Huang , Bin Wen , Jiannan Wu , Hongxun Yao , Yi Jiang , Xiatian Zhu , Zehuan Yuan

Masked Autoencoders (MAE) achieve self-supervised learning of image representations by randomly removing a portion of visual tokens and reconstructing the original image as a pretext task, thereby significantly enhancing pretraining…

Computer Vision and Pattern Recognition · Computer Science 2025-11-11 Jiaxuan Li , Qing Xu , Xiangjian He , Ziyu Liu , Chang Xing , Zhen Chen , Daokun Zhang , Rong Qu , Chang Wen Chen

Masked Autoencoder~(MAE) is a prevailing self-supervised learning method that achieves promising results in model pre-training. However, when the various downstream tasks have data distributions different from the pre-training data, the…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Zhili Liu , Kai Chen , Jianhua Han , Lanqing Hong , Hang Xu , Zhenguo Li , James T. Kwok

We present a variation of the Autoencoder (AE) that explicitly maximizes the mutual information between the input data and the hidden representation. The proposed model, the InfoMax Autoencoder (IMAE), by construction is able to learn a…

Machine Learning · Computer Science 2019-01-24 Vincenzo Crescimanna , Bruce Graham

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger vision learner. Towards this…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Zhicheng Huang , Xiaojie Jin , Chengze Lu , Qibin Hou , Ming-Ming Cheng , Dongmei Fu , Xiaohui Shen , Jiashi Feng

We propose ViC-MAE, a model that combines both Masked AutoEncoders (MAE) and contrastive learning. ViC-MAE is trained using a global featured obtained by pooling the local representations learned under an MAE reconstruction loss and…

Computer Vision and Pattern Recognition · Computer Science 2024-10-04 Jefferson Hernandez , Ruben Villegas , Vicente Ordonez

In this paper, we propose Mixed and Masked AutoEncoder (MixMAE), a simple but efficient pretraining method that is applicable to various hierarchical Vision Transformers. Existing masked image modeling (MIM) methods for hierarchical Vision…

Computer Vision and Pattern Recognition · Computer Science 2023-04-03 Jihao Liu , Xin Huang , Jinliang Zheng , Yu Liu , Hongsheng Li

Masked Image Modeling (MIM)-based models, such as SdAE, CAE, GreenMIM, and MixAE, have explored different strategies to enhance the performance of Masked Autoencoders (MAE) by modifying prediction, loss functions, or incorporating…

Computer Vision and Pattern Recognition · Computer Science 2024-06-26 Srinivasa Rao Nandam , Sara Atito , Zhenhua Feng , Josef Kittler , Muhammad Awais

Strong gravitational lensing can reveal the influence of dark-matter substructure in galaxies, but analyzing these effects from noisy, low-resolution images poses a significant challenge. In this work, we propose a masked autoencoder (MAE)…

‹ Prev 1 2 3 10 Next ›