Related papers: Context Autoencoder for Self-Supervised Representa…

Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation

Masked Autoencoder (MAE) has recently been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis. By reconstructing full images from partially masked inputs, a ViT encoder aggregates contextual…

Image and Video Processing · Electrical Eng. & Systems 2023-04-24 Lei Zhou , Huidong Liu , Joseph Bae , Junjun He , Dimitris Samaras , Prateek Prasanna

Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning

Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for both 2D and 3D computer vision. Nevertheless, existing MAE-based methods still have certain drawbacks. Firstly, the functional decoupling…

Computer Vision and Pattern Recognition · Computer Science 2023-10-06 Yang Liu , Chen Chen , Can Wang , Xulin King , Mengyuan Liu

Enhancing Representation Learning of EEG Data with Masked Autoencoders

Self-supervised learning has been a powerful training paradigm to facilitate representation learning. In this study, we design a masked autoencoder (MAE) to guide deep learning models to learn electroencephalography (EEG) signal…

Human-Computer Interaction · Computer Science 2024-09-04 Yifei Zhou , Sitong Liu

Self-Guided Masked Autoencoder

Masked Autoencoder (MAE) is a self-supervised approach for representation learning, widely applicable to a variety of downstream tasks in computer vision. In spite of its success, it is still not fully uncovered what and how MAE exactly…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Jeongwoo Shin , Inseo Lee , Junho Lee , Joonseok Lee

Contrastive Masked Autoencoders are Stronger Vision Learners

Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger vision learner. Towards this…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Zhicheng Huang , Xiaojie Jin , Chengze Lu , Qibin Hou , Ming-Ming Cheng , Dongmei Fu , Xiaohui Shen , Jiashi Feng

CL-MAE: Curriculum-Learned Masked Autoencoders

Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Neelu Madan , Nicolae-Catalin Ristea , Kamal Nasrollahi , Thomas B. Moeslund , Radu Tudor Ionescu

Downstream Task Guided Masking Learning in Masked Autoencoders Using Multi-Level Optimization

Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Han Guo , Ramtin Hosseini , Ruiyi Zhang , Sai Ashish Somayajula , Ranak Roy Chowdhury , Rajesh K. Gupta , Pengtao Xie

Efficient Masked Autoencoders with Self-Consistency

Inspired by the masked language modeling (MLM) in natural language processing tasks, the masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision. However, the high random mask ratio…

Computer Vision and Pattern Recognition · Computer Science 2024-06-04 Zhaowen Li , Yousong Zhu , Zhiyang Chen , Wei Li , Chaoyang Zhao , Rui Zhao , Ming Tang , Jinqiao Wang

Masked Autoencoders Are Scalable Vision Learners

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Kaiming He , Xinlei Chen , Saining Xie , Yanghao Li , Piotr Dollár , Ross Girshick

Mixed Autoencoder for Self-supervised Visual Representation Learning

Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction. However, effective data augmentation strategies for MAE still remain open questions, different…

Computer Vision and Pattern Recognition · Computer Science 2024-02-08 Kai Chen , Zhili Liu , Lanqing Hong , Hang Xu , Zhenguo Li , Dit-Yan Yeung

Rethinking Patch Dependence for Masked Autoencoders

In this work, we examine the impact of inter-patch dependencies in the decoder of masked autoencoders (MAE) on representation learning. We decompose the decoding mechanism for masked reconstruction into self-attention between mask tokens…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Letian Fu , Long Lian , Renhao Wang , Baifeng Shi , Xudong Wang , Adam Yala , Trevor Darrell , Alexei A. Efros , Ken Goldberg

Learning with Unmasked Tokens Drives Stronger Vision Learners

Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Taekyung Kim , Sanghyuk Chun , Byeongho Heo , Dongyoon Han

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Medical vision-and-language pre-training provides a feasible solution to extract effective vision-and-language representations from medical images and texts. However, few studies have been dedicated to this field to facilitate medical…

Computer Vision and Pattern Recognition · Computer Science 2022-09-16 Zhihong Chen , Yuhao Du , Jinpeng Hu , Yang Liu , Guanbin Li , Xiang Wan , Tsung-Hui Chang

CAE v2: Context Autoencoder with CLIP Target

Masked image modeling (MIM) learns visual representation by masking and reconstructing image patches. Applying the reconstruction supervision on the CLIP representation has been proven effective for MIM. However, it is still under-explored…

Computer Vision and Pattern Recognition · Computer Science 2022-11-18 Xinyu Zhang , Jiahui Chen , Junkun Yuan , Qiang Chen , Jian Wang , Xiaodi Wang , Shumin Han , Xiaokang Chen , Jimin Pi , Kun Yao , Junyu Han , Errui Ding , Jingdong Wang

Masked Autoencoders that Listen

This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio…

Sound · Computer Science 2023-01-13 Po-Yao Huang , Hu Xu , Juncheng Li , Alexei Baevski , Michael Auli , Wojciech Galuba , Florian Metze , Christoph Feichtenhofer

R-MAE: Regions Meet Masked Autoencoders

In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to…

Computer Vision and Pattern Recognition · Computer Science 2024-01-08 Duy-Kien Nguyen , Vaibhav Aggarwal , Yanghao Li , Martin R. Oswald , Alexander Kirillov , Cees G. M. Snoek , Xinlei Chen

Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking

Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-10 Peng Gao , Renrui Zhang , Rongyao Fang , Ziyi Lin , Hongyang Li , Hongsheng Li , Qiao Yu

Improvements to Self-Supervised Representation Learning for Masked Image Modeling

This paper explores improvements to the masked image modeling (MIM) paradigm. The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part.…

Computer Vision and Pattern Recognition · Computer Science 2022-05-24 Jiawei Mao , Xuesong Yin , Yuanqi Chang , Honggu Zhou

CoT-MAE v2: Contextual Masked Auto-Encoder with Multi-view Modeling for Passage Retrieval

Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the…

Computation and Language · Computer Science 2023-04-07 Xing Wu , Guangyuan Ma , Peng Wang , Meng Lin , Zijia Lin , Fuzheng Zhang , Songlin Hu

Masked Capsule Autoencoders

We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that utilises pretraining in a modern self-supervised paradigm, specifically the masked image modelling framework. Capsule Networks have emerged as a powerful…

Computer Vision and Pattern Recognition · Computer Science 2025-04-21 Miles Everett , Mingjun Zhong , Georgios Leontidis