Related papers: Context Autoencoder for Self-Supervised Representa…
Masked Autoencoder (MAE) has recently been shown to be effective in pre-training Vision Transformers (ViT) for natural image analysis. By reconstructing full images from partially masked inputs, a ViT encoder aggregates contextual…
Masked Autoencoders (MAE) have demonstrated promising performance in self-supervised learning for both 2D and 3D computer vision. Nevertheless, existing MAE-based methods still have certain drawbacks. Firstly, the functional decoupling…
Self-supervised learning has been a powerful training paradigm to facilitate representation learning. In this study, we design a masked autoencoder (MAE) to guide deep learning models to learn electroencephalography (EEG) signal…
Masked Autoencoder (MAE) is a self-supervised approach for representation learning, widely applicable to a variety of downstream tasks in computer vision. In spite of its success, it is still not fully uncovered what and how MAE exactly…
Masked image modeling (MIM) has achieved promising results on various vision tasks. However, the limited discriminability of learned representation manifests there is still plenty to go for making a stronger vision learner. Towards this…
Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches…
Masked Autoencoder (MAE) is a notable method for self-supervised pretraining in visual representation learning. It operates by randomly masking image patches and reconstructing these masked patches using the unmasked ones. A key limitation…
Inspired by the masked language modeling (MLM) in natural language processing tasks, the masked image modeling (MIM) has been recognized as a strong self-supervised pre-training method in computer vision. However, the high random mask ratio…
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners for computer vision. Our MAE approach is simple: we mask random patches of the input image and reconstruct the missing pixels. It is based on two core…
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks via randomly masking image patches and reconstruction. However, effective data augmentation strategies for MAE still remain open questions, different…
In this work, we examine the impact of inter-patch dependencies in the decoder of masked autoencoders (MAE) on representation learning. We decompose the decoding mechanism for masked reconstruction into self-attention between mask tokens…
Masked image modeling (MIM) has become a leading self-supervised learning strategy. MIMs such as Masked Autoencoder (MAE) learn strong representations by randomly masking input tokens for the encoder to process, with the decoder…
Medical vision-and-language pre-training provides a feasible solution to extract effective vision-and-language representations from medical images and texts. However, few studies have been dedicated to this field to facilitate medical…
Masked image modeling (MIM) learns visual representation by masking and reconstructing image patches. Applying the reconstruction supervision on the CLIP representation has been proven effective for MIM. However, it is still under-explored…
This paper studies a simple extension of image-based Masked Autoencoders (MAE) to self-supervised representation learning from audio spectrograms. Following the Transformer encoder-decoder design in MAE, our Audio-MAE first encodes audio…
In this work, we explore regions as a potential visual analogue of words for self-supervised image representation learning. Inspired by Masked Autoencoding (MAE), a generative pre-training baseline, we propose masked region autoencoding to…
Masked Autoencoders (MAE) have been popular paradigms for large-scale vision representation pre-training. However, MAE solely reconstructs the low-level RGB signals after the decoder and lacks supervision upon high-level semantics for the…
This paper explores improvements to the masked image modeling (MIM) paradigm. The MIM paradigm enables the model to learn the main object features of the image by masking the input image and predicting the masked part by the unmasked part.…
Growing techniques have been emerging to improve the performance of passage retrieval. As an effective representation bottleneck pretraining technique, the contextual masked auto-encoder utilizes contextual embedding to assist in the…
We propose Masked Capsule Autoencoders (MCAE), the first Capsule Network that utilises pretraining in a modern self-supervised paradigm, specifically the masked image modelling framework. Capsule Networks have emerged as a powerful…