Related papers: Learning Imbalanced Data with Vision Transformers

Rethink Long-tailed Recognition with Vision Transformers

In the real world, data tends to follow long-tailed distributions w.r.t. class or attribution, motivating the challenging Long-Tailed Recognition (LTR) problem. In this paper, we revisit recent LTR methods with promising Vision Transformers…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Zhengzhuo Xu , Shuo Yang , Xingjun Wang , Chun Yuan

VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Changyao Tian , Wenhai Wang , Xizhou Zhu , Jifeng Dai , Yu Qiao

When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations

Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures. Existing works empower the models by massive data, such as large-scale pre-training…

Computer Vision and Pattern Recognition · Computer Science 2022-03-15 Xiangning Chen , Cho-Jui Hsieh , Boqing Gong

Minority-Oriented Vicinity Expansion with Attentive Aggregation for Video Long-Tailed Recognition

A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In…

Computer Vision and Pattern Recognition · Computer Science 2022-11-28 WonJun Moon , Hyun Seok Seong , Jae-Pil Heo

Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders

Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite their success, ViTs lack inductive biases, which can make it difficult to train them with limited data. To address this challenge, prior studies suggest training…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Srijan Das , Tanmay Jain , Dominick Reilly , Pranav Balaji , Soumyajit Karmakar , Shyam Marjit , Xiang Li , Abhijit Das , Michael S. Ryoo

On The Relationship Between Continual Learning and Long-Tailed Recognition

Real-world datasets often exhibit long-tailed distributions, where a few dominant "Head" classes have abundant samples while most "Tail" classes are severely underrepresented, leading to biased learning and poor generalization for the Tail.…

Machine Learning · Computer Science 2026-02-02 Mahdiyar Molahasani , Michael Greenspan , Ali Etemad

Long-Tailed Recognition via Weight Balancing

In the real open world, data tends to follow long-tailed class distributions, motivating the well-studied long-tailed recognition (LTR) problem. Naive training produces models that are biased toward common classes in terms of higher…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Shaden Alshammari , Yu-Xiong Wang , Deva Ramanan , Shu Kong

Adjusting Logit in Gaussian Form for Long-Tailed Visual Recognition

It is not uncommon that real-world data are distributed with a long tail. For such data, the learning of deep neural networks becomes challenging because it is hard to classify tail classes correctly. In the literature, several existing…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Mengke Li , Yiu-ming Cheung , Yang Lu , Zhikai Hu , Weichao Lan , Hui Huang

Balanced Contrastive Learning for Long-Tailed Visual Recognition

Real-world data typically follow a long-tailed distribution, where a few majority categories occupy most of the data while most minority categories contain a limited number of samples. Classification models minimizing cross-entropy struggle…

Computer Vision and Pattern Recognition · Computer Science 2022-09-13 Jianggang Zhu , Zheng Wang , Jingjing Chen , Yi-Ping Phoebe Chen , Yu-Gang Jiang

A dual-branch model with inter- and intra-branch contrastive loss for long-tailed recognition

Real-world data often exhibits a long-tailed distribution, in which head classes occupy most of the data, while tail classes only have very few samples. Models trained on long-tailed datasets have poor adaptability to tail classes and the…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Qiong Chen , Tianlin Huang , Geren Zhu , Enlu Lin

LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers

Vision Transformers (ViTs) have emerged as popular models in computer vision, demonstrating state-of-the-art performance across various tasks. This success typically follows a two-stage strategy involving pre-training on large-scale…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Zijun Long , Zaiqiao Meng , Gerardo Aragon Camarasa , Richard McCreadie

Multi-Attribute Vision Transformers are Efficient and Robust Learners

Since their inception, Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs) across a wide spectrum of tasks. ViTs exhibit notable characteristics, including global attention, resilience…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Hanan Gani , Nada Saadi , Noor Hussein , Karthik Nandakumar

From Head to Tail: Towards Balanced Representation in Large Vision-Language Models through Adaptive Data Calibration

Large Vision-Language Models (LVLMs) have achieved significant progress in combining visual comprehension with language generation. Despite this success, the training data of LVLMs still suffers from Long-Tail (LT) problems, where the data…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Mingyang Song , Xiaoye Qu , Jiawei Zhou , Yu Cheng

How to Train Vision Transformer on Small-scale Datasets?

Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Hanan Gani , Muzammal Naseer , Mohammad Yaqub

LPT++: Efficient Training on Mixture of Long-tailed Experts

We introduce LPT++, a comprehensive framework for long-tailed classification that combines parameter-efficient fine-tuning (PEFT) with a learnable model ensemble. LPT++ enhances frozen Vision Transformers (ViTs) through the integration of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-18 Bowen Dong , Pan Zhou , Wangmeng Zuo

Efficient Long-Tail Learning in Latent Space by sampling Synthetic Data

Imbalanced classification datasets pose significant challenges in machine learning, often leading to biased models that perform poorly on underrepresented classes. With the rise of foundation models, recent research has focused on the full,…

Machine Learning · Computer Science 2025-09-22 Nakul Sharma

Language-Unlocked ViT (LUViT): Empowering Self-Supervised Vision Transformers with LLMs

The integration of Large Language Model (LLMs) blocks with Vision Transformers (ViTs) holds immense promise for vision-only tasks by leveraging the rich semantic knowledge and reasoning capabilities of LLMs. However, a fundamental challenge…

Computer Vision and Pattern Recognition · Computer Science 2025-07-10 Selim Kuzucu , Muhammad Ferjad Naeem , Anna Kukleva , Federico Tombari , Bernt Schiele

Training Vision Transformers with Only 2040 Images

Vision Transformers (ViTs) is emerging as an alternative to convolutional neural networks (CNNs) for visual recognition. They achieve competitive results with CNNs but the lack of the typical convolutional inductive bias makes them more…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Yun-Hao Cao , Hao Yu , Jianxin Wu

Learning to Instruct for Visual Instruction Tuning

We propose L2T, an advancement of visual instruction tuning (VIT). While VIT equips Multimodal LLMs (MLLMs) with promising multimodal capabilities, the current design choices for VIT often result in overfitting and shortcut learning,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Zhihan Zhou , Feng Hong , Jiaan Luo , Jiangchao Yao , Dongsheng Li , Bo Han , Ya Zhang , Yanfeng Wang

Long-Tailed Classification with Gradual Balanced Loss and Adaptive Feature Generation

The real-world data distribution is essentially long-tailed, which poses great challenge to the deep model. In this work, we propose a new method, Gradual Balanced Loss and Adaptive Feature Generator (GLAG) to alleviate imbalance. GLAG…

Computer Vision and Pattern Recognition · Computer Science 2022-03-02 Zihan Zhang , Xiang Xiang