Related papers: Pre-Trained Image Processing Transformer

BEiT: BERT Pre-Training of Image Transformers

We introduce a self-supervised vision representation model BEiT, which stands for Bidirectional Encoder representation from Image Transformers. Following BERT developed in the natural language processing area, we propose a masked image…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Hangbo Bao , Li Dong , Songhao Piao , Furu Wei

Instruct-IPT: All-in-One Image Processing Transformer via Weight Modulation

Due to the unaffordable size and intensive computation costs of low-level vision models, All-in-One models that are designed to address a handful of low-level vision tasks simultaneously have been popular. However, existing All-in-One…

Computer Vision and Pattern Recognition · Computer Science 2024-12-17 Yuchuan Tian , Jianhong Han , Hanting Chen , Yuanyuan Xi , Ning Ding , Jie Hu , Chao Xu , Yunhe Wang

GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task

The upsurge in pre-trained large models started by ChatGPT has swept across the entire deep learning community. Such powerful models demonstrate advanced generative ability and multimodal understanding capability, which quickly set new…

Computer Vision and Pattern Recognition · Computer Science 2025-02-28 Ning Ding , Yehui Tang , Zhongqian Fu , Chao Xu , Kai Han , Yunhe Wang

Magnetic Resonance Image Processing Transformer for General Accelerated Image Reconstruction

Recent advancements in deep learning have enabled the development of generalizable models that achieve state-of-the-art performance across various imaging tasks. Vision Transformer (ViT)-based architectures, in particular, have demonstrated…

Image and Video Processing · Electrical Eng. & Systems 2025-02-11 Guoyao Shen , Mengyu Li , Stephan Anderson , Chad W. Farris , Xin Zhang

SiT: Self-supervised vIsion Transformer

Self-supervised learning methods are gaining increasing traction in computer vision due to their recent success in reducing the gap with supervised learning. In natural language processing (NLP) self-supervised learning and transformers are…

Computer Vision and Pattern Recognition · Computer Science 2022-12-29 Sara Atito , Muhammad Awais , Josef Kittler

Rejuvenating image-GPT as Strong Visual Representation Learners

This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning. Two simple yet essential changes are made. First, we shift the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Sucheng Ren , Zeyu Wang , Hongru Zhu , Junfei Xiao , Alan Yuille , Cihang Xie

On Efficient Transformer-Based Image Pre-training for Low-Level Vision

Pre-training has marked numerous state of the arts in high-level computer vision, while few attempts have ever been made to investigate how pre-training acts in image processing systems. In this paper, we tailor transformer-based…

Computer Vision and Pattern Recognition · Computer Science 2022-03-22 Wenbo Li , Xin Lu , Shengju Qian , Jiangbo Lu , Xiangyu Zhang , Jiaya Jia

ImageBERT: Cross-modal Pre-training with Large-scale Weak-supervised Image-Text Data

In this paper, we introduce a new vision-language pre-trained model -- ImageBERT -- for image-text joint embedding. Our model is a Transformer-based model, which takes different modalities as input and models the relationship between them.…

Computer Vision and Pattern Recognition · Computer Science 2020-01-24 Di Qi , Lin Su , Jia Song , Edward Cui , Taroon Bharti , Arun Sacheti

Identity Preserve Transform: Understand What Activity Classification Models Have Learnt

Activity classification has observed great success recently. The performance on small dataset is almost saturated and people are moving towards larger datasets. What leads to the performance gain on the model and what the model has learnt?…

Computer Vision and Pattern Recognition · Computer Science 2019-12-16 Jialing Lyu , Weichao Qiu , Xinyue Wei , Yi Zhang , Alan Yuille , Zheng-Jun Zha

Developmental Pretraining (DPT) for Image Classification Networks

In the backdrop of increasing data requirements of Deep Neural Networks for object recognition that is growing more untenable by the day, we present Developmental PreTraining (DPT) as a possible solution. DPT is designed as a…

Machine Learning · Computer Science 2023-12-04 Niranjan Rajesh , Debayan Gupta

Image Deblurring by Exploring In-depth Properties of Transformer

Image deblurring continues to achieve impressive performance with the development of generative models. Nonetheless, there still remains a displeasing problem if one wants to improve perceptual quality and quantitative scores of recovered…

Computer Vision and Pattern Recognition · Computer Science 2024-01-30 Pengwei Liang , Junjun Jiang , Xianming Liu , Jiayi Ma

Going deeper with Image Transformers

Transformers have been recently adapted for large scale image classification, achieving high scores shaking up the long supremacy of convolutional neural networks. However the optimization of image transformers has been little studied so…

Computer Vision and Pattern Recognition · Computer Science 2021-04-08 Hugo Touvron , Matthieu Cord , Alexandre Sablayrolles , Gabriel Synnaeve , Hervé Jégou

An Educated Warm Start For Deep Image Prior-Based Micro CT Reconstruction

Deep image prior (DIP) was recently introduced as an effective unsupervised approach for image restoration tasks. DIP represents the image to be recovered as the output of a deep convolutional neural network, and learns the network's…

Image and Video Processing · Electrical Eng. & Systems 2023-02-10 Riccardo Barbano , Johannes Leuschner , Maximilian Schmidt , Alexander Denker , Andreas Hauptmann , Peter Maaß , Bangti Jin

Universal Image Restoration Pre-training via Degradation Classification

This paper proposes the Degradation Classification Pre-Training (DCPT), which enables models to learn how to classify the degradation type of input images for universal image restoration pre-training. Unlike the existing self-supervised…

Computer Vision and Pattern Recognition · Computer Science 2025-01-28 JiaKui Hu , Lujia Jin , Zhengjian Yao , Yanye Lu

Towards a Unified Foundation Model: Jointly Pre-Training Transformers on Unpaired Images and Text

In this paper, we explore the possibility of building a unified foundation model that can be adapted to both vision-only and text-only tasks. Starting from BERT and ViT, we design a unified transformer consisting of modality-specific…

Computer Vision and Pattern Recognition · Computer Science 2021-12-15 Qing Li , Boqing Gong , Yin Cui , Dan Kondratyuk , Xianzhi Du , Ming-Hsuan Yang , Matthew Brown

Pre-Trained Language Transformers are Universal Image Classifiers

Facial images disclose many hidden personal traits such as age, gender, race, health, emotion, and psychology. Understanding these traits will help to classify the people in different attributes. In this paper, we have presented a novel…

Computer Vision and Pattern Recognition · Computer Science 2022-01-26 Rahul Goel , Modar Sulaiman , Kimia Noorbakhsh , Mahdi Sharifi , Rajesh Sharma , Pooyan Jamshidi , Kallol Roy

Deep transfer learning for image classification: a survey

Deep neural networks such as convolutional neural networks (CNNs) and transformers have achieved many successes in image classification in recent years. It has been consistently demonstrated that best practice for image classification is…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Jo Plested , Musa Phiri , Tom Gedeon

Perceptual Image Quality Assessment with Transformers

In this paper, we propose an image quality transformer (IQT) that successfully applies a transformer architecture to a perceptual full-reference image quality assessment (IQA) task. Perceptual representation becomes more important in image…

Computer Vision and Pattern Recognition · Computer Science 2021-05-06 Manri Cheon , Sung-Jun Yoon , Byungyeon Kang , Junwoo Lee

ITTR: Unpaired Image-to-Image Translation with Transformers

Unpaired image-to-image translation is to translate an image from a source domain to a target domain without paired training data. By utilizing CNN in extracting local semantics, various techniques have been developed to improve the…

Computer Vision and Pattern Recognition · Computer Science 2022-03-31 Wanfeng Zheng , Qiang Li , Guoxin Zhang , Pengfei Wan , Zhongyuan Wang

Combating Digitally Altered Images: Deepfake Detection

The rise of Deepfake technology to generate hyper-realistic manipulated images and videos poses a significant challenge to the public and relevant authorities. This study presents a robust Deepfake detection based on a modified Vision…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Saksham Kumar , Rhythm Narang