English
Related papers

Related papers: Learning Imbalanced Data with Vision Transformers

200 papers

In the real world, data tends to follow long-tailed distributions w.r.t. class or attribution, motivating the challenging Long-Tailed Recognition (LTR) problem. In this paper, we revisit recent LTR methods with promising Vision Transformers…

Computer Vision and Pattern Recognition · Computer Science 2023-04-18 Zhengzhuo Xu , Shuo Yang , Xingjun Wang , Chun Yuan

Deep learning-based models encounter challenges when processing long-tailed data in the real world. Existing solutions usually employ some balancing strategies or transfer learning to deal with the class imbalance problem, based on the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-20 Changyao Tian , Wenhai Wang , Xizhou Zhu , Jifeng Dai , Yu Qiao

Vision Transformers (ViTs) and MLPs signal further efforts on replacing hand-wired features or inductive biases with general-purpose neural architectures. Existing works empower the models by massive data, such as large-scale pre-training…

Computer Vision and Pattern Recognition · Computer Science 2022-03-15 Xiangning Chen , Cho-Jui Hsieh , Boqing Gong

A dramatic increase in real-world video volume with extremely diverse and emerging topics naturally forms a long-tailed video distribution in terms of their categories, and it spotlights the need for Video Long-Tailed Recognition (VLTR). In…

Computer Vision and Pattern Recognition · Computer Science 2022-11-28 WonJun Moon , Hyun Seok Seong , Jae-Pil Heo

Vision Transformers (ViTs) have become ubiquitous in computer vision. Despite their success, ViTs lack inductive biases, which can make it difficult to train them with limited data. To address this challenge, prior studies suggest training…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Srijan Das , Tanmay Jain , Dominick Reilly , Pranav Balaji , Soumyajit Karmakar , Shyam Marjit , Xiang Li , Abhijit Das , Michael S. Ryoo

Real-world datasets often exhibit long-tailed distributions, where a few dominant "Head" classes have abundant samples while most "Tail" classes are severely underrepresented, leading to biased learning and poor generalization for the Tail.…

Machine Learning · Computer Science 2026-02-02 Mahdiyar Molahasani , Michael Greenspan , Ali Etemad

In the real open world, data tends to follow long-tailed class distributions, motivating the well-studied long-tailed recognition (LTR) problem. Naive training produces models that are biased toward common classes in terms of higher…

Computer Vision and Pattern Recognition · Computer Science 2022-03-29 Shaden Alshammari , Yu-Xiong Wang , Deva Ramanan , Shu Kong

It is not uncommon that real-world data are distributed with a long tail. For such data, the learning of deep neural networks becomes challenging because it is hard to classify tail classes correctly. In the literature, several existing…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Mengke Li , Yiu-ming Cheung , Yang Lu , Zhikai Hu , Weichao Lan , Hui Huang

Real-world data typically follow a long-tailed distribution, where a few majority categories occupy most of the data while most minority categories contain a limited number of samples. Classification models minimizing cross-entropy struggle…

Computer Vision and Pattern Recognition · Computer Science 2022-09-13 Jianggang Zhu , Zheng Wang , Jingjing Chen , Yi-Ping Phoebe Chen , Yu-Gang Jiang

Real-world data often exhibits a long-tailed distribution, in which head classes occupy most of the data, while tail classes only have very few samples. Models trained on long-tailed datasets have poor adaptability to tail classes and the…

Computer Vision and Pattern Recognition · Computer Science 2023-09-29 Qiong Chen , Tianlin Huang , Geren Zhu , Enlu Lin

Vision Transformers (ViTs) have emerged as popular models in computer vision, demonstrating state-of-the-art performance across various tasks. This success typically follows a two-stage strategy involving pre-training on large-scale…

Computer Vision and Pattern Recognition · Computer Science 2024-02-07 Zijun Long , Zaiqiao Meng , Gerardo Aragon Camarasa , Richard McCreadie

Since their inception, Vision Transformers (ViTs) have emerged as a compelling alternative to Convolutional Neural Networks (CNNs) across a wide spectrum of tasks. ViTs exhibit notable characteristics, including global attention, resilience…

Computer Vision and Pattern Recognition · Computer Science 2024-07-22 Hanan Gani , Nada Saadi , Noor Hussein , Karthik Nandakumar

Large Vision-Language Models (LVLMs) have achieved significant progress in combining visual comprehension with language generation. Despite this success, the training data of LVLMs still suffers from Long-Tail (LT) problems, where the data…

Computer Vision and Pattern Recognition · Computer Science 2025-05-30 Mingyang Song , Xiaoye Qu , Jiawei Zhou , Yu Cheng

Vision Transformer (ViT), a radically different architecture than convolutional neural networks offers multiple advantages including design simplicity, robustness and state-of-the-art performance on many vision tasks. However, in contrast…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Hanan Gani , Muzammal Naseer , Mohammad Yaqub

We introduce LPT++, a comprehensive framework for long-tailed classification that combines parameter-efficient fine-tuning (PEFT) with a learnable model ensemble. LPT++ enhances frozen Vision Transformers (ViTs) through the integration of…

Computer Vision and Pattern Recognition · Computer Science 2024-09-18 Bowen Dong , Pan Zhou , Wangmeng Zuo

Imbalanced classification datasets pose significant challenges in machine learning, often leading to biased models that perform poorly on underrepresented classes. With the rise of foundation models, recent research has focused on the full,…

Machine Learning · Computer Science 2025-09-22 Nakul Sharma

The integration of Large Language Model (LLMs) blocks with Vision Transformers (ViTs) holds immense promise for vision-only tasks by leveraging the rich semantic knowledge and reasoning capabilities of LLMs. However, a fundamental challenge…

Computer Vision and Pattern Recognition · Computer Science 2025-07-10 Selim Kuzucu , Muhammad Ferjad Naeem , Anna Kukleva , Federico Tombari , Bernt Schiele

Vision Transformers (ViTs) is emerging as an alternative to convolutional neural networks (CNNs) for visual recognition. They achieve competitive results with CNNs but the lack of the typical convolutional inductive bias makes them more…

Computer Vision and Pattern Recognition · Computer Science 2022-01-27 Yun-Hao Cao , Hao Yu , Jianxin Wu

We propose L2T, an advancement of visual instruction tuning (VIT). While VIT equips Multimodal LLMs (MLLMs) with promising multimodal capabilities, the current design choices for VIT often result in overfitting and shortcut learning,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Zhihan Zhou , Feng Hong , Jiaan Luo , Jiangchao Yao , Dongsheng Li , Bo Han , Ya Zhang , Yanfeng Wang

The real-world data distribution is essentially long-tailed, which poses great challenge to the deep model. In this work, we propose a new method, Gradual Balanced Loss and Adaptive Feature Generator (GLAG) to alleviate imbalance. GLAG…

Computer Vision and Pattern Recognition · Computer Science 2022-03-02 Zihan Zhang , Xiang Xiang
‹ Prev 1 2 3 10 Next ›