Related papers: DocMAE: Document Image Rectification via Self-supe…

Domain Invariant Masked Autoencoders for Self-supervised Learning from Multi-domains

Generalizing learned representations across significantly different visual domains is a fundamental yet crucial ability of the human visual system. While recent self-supervised learning methods have achieved good performances with…

Computer Vision and Pattern Recognition · Computer Science 2022-06-07 Haiyang Yang , Meilin Chen , Yizhou Wang , Shixiang Tang , Feng Zhu , Lei Bai , Rui Zhao , Wanli Ouyang

Deep Unrestricted Document Image Rectification

In recent years, tremendous efforts have been made on document image rectification, but existing advanced algorithms are limited to processing restricted document images, i.e., the input images must incorporate a complete document. Once the…

Computer Vision and Pattern Recognition · Computer Science 2023-12-19 Hao Feng , Shaokai Liu , Jiajun Deng , Wengang Zhou , Houqiang Li

Improving Masked Autoencoders by Learning Where to Mask

Masked image modeling is a promising self-supervised learning method for visual data. It is typically built upon image patches with random masks, which largely ignores the variation of information density between them. The question is: Is…

Computer Vision and Pattern Recognition · Computer Science 2024-01-09 Haijian Chen , Wendong Zhang , Yunbo Wang , Xiaokang Yang

Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders

Self-supervised pre-training of image encoders is omnipresent in the literature, particularly following the introduction of Masked autoencoders (MAE). Current efforts attempt to learn object-centric representations from motion in videos. In…

Computer Vision and Pattern Recognition · Computer Science 2025-02-18 Alexandre Eymaël , Renaud Vandeghen , Anthony Cioppa , Silvio Giancola , Bernard Ghanem , Marc Van Droogenbroeck

DocScanner: Robust Document Image Rectification with Progressive Learning

Compared with flatbed scanners, portable smartphones provide more convenience for physical document digitization. However, such digitized documents are often distorted due to uncontrolled physical deformations, camera positions, and…

Computer Vision and Pattern Recognition · Computer Science 2022-12-27 Hao Feng , Wengang Zhou , Jiajun Deng , Qi Tian , Houqiang Li

Siamese Masked Autoencoders

Establishing correspondence between images or scenes is a significant challenge in computer vision, especially given occlusions, viewpoint changes, and varying object appearances. In this paper, we present Siamese Masked Autoencoders…

Computer Vision and Pattern Recognition · Computer Science 2023-05-24 Agrim Gupta , Jiajun Wu , Jia Deng , Li Fei-Fei

Geometric Representation Learning for Document Image Rectification

In document image rectification, there exist rich geometric constraints between the distorted image and the ground truth one. However, such geometric constraints are largely ignored in existing advanced solutions, which limits the…

Computer Vision and Pattern Recognition · Computer Science 2022-10-18 Hao Feng , Wengang Zhou , Jiajun Deng , Yuechen Wang , Houqiang Li

SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners

Recently, self-supervised Masked Autoencoders (MAE) have attracted unprecedented attention for their impressive representation learning ability. However, the pretext task, Masked Image Modeling (MIM), reconstructs the missing local patches,…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Feng Liang , Yangguang Li , Diana Marculescu

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement

In this paper, we propose a Text-Degradation Invariant Auto Encoder (Text-DIAE), a self-supervised model designed to tackle two tasks, text recognition (handwritten or scene-text) and document image enhancement. We start by employing a…

Computer Vision and Pattern Recognition · Computer Science 2022-08-19 Mohamed Ali Souibgui , Sanket Biswas , Andres Mafla , Ali Furkan Biten , Alicia Fornés , Yousri Kessentini , Josep Lladós , Lluis Gomez , Dimosthenis Karatzas

Document Image Rectification Bases on Self-Adaptive Multitask Fusion

Deformed document image rectification is essential for real-world document understanding tasks, such as layout analysis and text recognition. However, current multi-task methods -- such as background removal, 3D coordinate prediction, and…

Computer Vision and Pattern Recognition · Computer Science 2025-05-12 Heng Li , Xiangping Wu , Qingcai Chen

CL-MAE: Curriculum-Learned Masked Autoencoders

Masked image modeling has been demonstrated as a powerful pretext task for generating robust representations that can be effectively generalized across multiple downstream tasks. Typically, this approach involves randomly masking patches…

Computer Vision and Pattern Recognition · Computer Science 2024-02-29 Neelu Madan , Nicolae-Catalin Ristea , Kamal Nasrollahi , Thomas B. Moeslund , Radu Tudor Ionescu

Structure is Supervision: Multiview Masked Autoencoders for Radiology

Building robust medical machine learning systems requires pretraining strategies that exploit the intrinsic structure present in clinical data. We introduce Multiview Masked Autoencoder (MVMAE), a self-supervised framework that leverages…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Sonia Laguna , Andrea Agostini , Alain Ryser , Samuel Ruiperez-Campillo , Irene Cannistraci , Moritz Vandenhirtz , Stephan Mandt , Nicolas Deperrois , Farhad Nooralahzadeh , Michael Krauthammer , Thomas M. Sutter , Julia E. Vogt

Unsupervised Representation Learning from Pre-trained Diffusion Probabilistic Models

Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples. Recently, diffusion autoencoders (Diff-AE) have been proposed to explore DPMs for representation learning via autoencoding. Their…

Computer Vision and Pattern Recognition · Computer Science 2023-03-02 Zijian Zhang , Zhou Zhao , Zhijie Lin

Deep Learning-based Forgery Attack on Document Images

With the ongoing popularization of online services, the digital document images have been used in various applications. Meanwhile, there have emerged some deep learning-based text editing algorithms which alter the textual information of an…

Multimedia · Computer Science 2021-09-13 Lin Zhao , Changsheng Chen , Jiwu Huang

Denoising Masked AutoEncoders Help Robust Classification

In this paper, we propose a new self-supervised method, which is called Denoising Masked AutoEncoders (DMAE), for learning certified robust classifiers of images. In DMAE, we corrupt each image by adding Gaussian noises to each pixel value…

Computer Vision and Pattern Recognition · Computer Science 2023-03-08 Quanlin Wu , Hang Ye , Yuntian Gu , Huishuai Zhang , Liwei Wang , Di He

ColorMAE: Exploring data-independent masking strategies in Masked AutoEncoders

Masked AutoEncoders (MAE) have emerged as a robust self-supervised framework, offering remarkable performance across a wide range of downstream tasks. To increase the difficulty of the pretext task and learn richer visual representations,…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Carlos Hinojosa , Shuming Liu , Bernard Ghanem

Concatenated Masked Autoencoders as Spatial-Temporal Learner

Learning representations from videos requires understanding continuous motion and visual correspondences between frames. In this paper, we introduce the Concatenated Masked Autoencoders (CatMAE) as a spatial-temporal learner for…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Zhouqiang Jiang , Bowen Wang , Tong Xiang , Zhaofeng Niu , Hong Tang , Guangshun Li , Liangzhi Li

Self-Supervised Image Representation Learning: Transcending Masking with Paired Image Overlay

Self-supervised learning has become a popular approach in recent years for its ability to learn meaningful representations without the need for data annotation. This paper proposes a novel image augmentation technique, overlaying images,…

Computer Vision and Pattern Recognition · Computer Science 2023-01-25 Yinheng Li , Han Ding , Shaofei Wang

Multi-Modal Masked Autoencoders for Medical Vision-and-Language Pre-Training

Medical vision-and-language pre-training provides a feasible solution to extract effective vision-and-language representations from medical images and texts. However, few studies have been dedicated to this field to facilitate medical…

Computer Vision and Pattern Recognition · Computer Science 2022-09-16 Zhihong Chen , Yuhao Du , Jinpeng Hu , Yang Liu , Guanbin Li , Xiang Wan , Tsung-Hui Chang

Improving Visual Representation Learning through Perceptual Understanding

We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual…

Computer Vision and Pattern Recognition · Computer Science 2023-03-29 Samyakh Tukra , Frederick Hoffman , Ken Chatfield