Related papers: Contrastive Visual Data Augmentation

CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding

Data augmentation has been demonstrated as an effective strategy for improving model generalization and data efficiency. However, due to the discrete nature of natural language, designing label-preserving transformations for text data tends…

Computation and Language · Computer Science 2020-10-20 Yanru Qu , Dinghan Shen , Yelong Shen , Sandra Sajeev , Jiawei Han , Weizhu Chen

Img-Diff: Contrastive Data Synthesis for Multimodal Large Language Models

High-performance Multimodal Large Language Models (MLLMs) are heavily dependent on data quality. To advance fine-grained image recognition within MLLMs, we introduce a novel data synthesis method inspired by contrastive learning and image…

Computer Vision and Pattern Recognition · Computer Science 2024-12-20 Qirui Jiao , Daoyuan Chen , Yilun Huang , Bolin Ding , Yaliang Li , Ying Shen

ConDA: Contrastive Domain Adaptation for AI-generated Text Detection

Large language models (LLMs) are increasingly being used for generating text in a variety of use cases, including journalistic news articles. Given the potential malicious nature in which these LLMs can be used to generate disinformation at…

Computation and Language · Computer Science 2023-09-22 Amrita Bhattacharjee , Tharindu Kumarage , Raha Moraffah , Huan Liu

Adaptive Data Augmentation for Contrastive Learning

In computer vision, contrastive learning is the most advanced unsupervised learning framework. Yet most previous methods simply apply fixed composition of data augmentations to improve data efficiency, which ignores the changes in their…

Computer Vision and Pattern Recognition · Computer Science 2023-04-20 Yuhan Zhang , He Zhu , Shan Yu

Multimodal Contrastive Training for Visual Representation Learning

We develop an approach to learning visual representations that embraces multimodal data, driven by a combination of intra- and inter-modal similarity preservation objectives. Unlike existing visual pre-training methods, which solve a proxy…

Computer Vision and Pattern Recognition · Computer Science 2021-04-28 Xin Yuan , Zhe Lin , Jason Kuen , Jianming Zhang , Yilin Wang , Michael Maire , Ajinkya Kale , Baldo Faieta

Enhancing Conceptual Understanding in Multimodal Contrastive Learning through Hard Negative Samples

Current multimodal models leveraging contrastive learning often face limitations in developing fine-grained conceptual understanding. This is due to random negative samples during pretraining, causing almost exclusively very dissimilar…

Computer Vision and Pattern Recognition · Computer Science 2024-08-06 Philipp J. Rösch , Norbert Oswald , Michaela Geierhos , Jindřich Libovický

ARMADA: Attribute-Based Multimodal Data Augmentation

In Multimodal Language Models (MLMs), the cost of manually annotating high-quality image-text pair data for fine-tuning and alignment is extremely high. While existing multimodal data augmentation frameworks propose ways to augment…

Artificial Intelligence · Computer Science 2024-08-20 Xiaomeng Jin , Jeonghwan Kim , Yu Zhou , Kuan-Hao Huang , Te-Lin Wu , Nanyun Peng , Heng Ji

CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection

Recent LiDAR-based 3D Object Detection (3DOD) methods show promising results, but they often do not generalize well to target domains outside the source (or training) data distribution. To reduce such domain gaps and thus to make 3DOD…

Computer Vision and Pattern Recognition · Computer Science 2024-03-08 Gyusam Chang , Wonseok Roh , Sujin Jang , Dongwook Lee , Daehyun Ji , Gyeongrok Oh , Jinsun Park , Jinkyu Kim , Sangpil Kim

Contrastive Learning with Stronger Augmentations

Representation learning has significantly been developed with the advance of contrastive learning methods. Most of those methods have benefited from various data augmentations that are carefully designated to maintain their identities so…

Computer Vision and Pattern Recognition · Computer Science 2022-01-24 Xiao Wang , Guo-Jun Qi

Comparison Visual Instruction Tuning

Comparing two images in terms of Commonalities and Differences (CaD) is a fundamental human capability that forms the basis of advanced visual reasoning and interpretation. It is essential for the generation of detailed and contextually…

Computer Vision and Pattern Recognition · Computer Science 2024-06-14 Wei Lin , Muhammad Jehanzeb Mirza , Sivan Doveh , Rogerio Feris , Raja Giryes , Sepp Hochreiter , Leonid Karlinsky

CoDa: Constrained Generation based Data Augmentation for Low-Resource NLP

We present CoDa (Constrained Generation based Data Augmentation), a controllable, effective, and training-free data augmentation technique for low-resource (data-scarce) NLP. Our approach is based on prompting off-the-shelf…

Computation and Language · Computer Science 2024-04-02 Chandra Kiran Reddy Evuru , Sreyan Ghosh , Sonal Kumar , Ramaneswaran S , Utkarsh Tyagi , Dinesh Manocha

Learning Multimodal Data Augmentation in Feature Space

The ability to jointly learn from multiple modalities, such as text, audio, and visual data, is a defining feature of intelligent systems. While there have been promising advances in designing neural networks to harness multimodal data, the…

Machine Learning · Computer Science 2023-04-25 Zichang Liu , Zhiqiang Tang , Xingjian Shi , Aston Zhang , Mu Li , Anshumali Shrivastava , Andrew Gordon Wilson

CoDA: Towards Effective Cross-domain Knowledge Transfer via CoT-guided Domain Adaptation

Large language models (LLMs) have achieved substantial advances in logical reasoning, yet they continue to lag behind human-level performance. In-context learning provides a viable solution that boosts the model's performance via prompting…

Artificial Intelligence · Computer Science 2026-04-22 Jianzhi Yan , Le Liu , Buzhou Tang , Yang Xiang , Dongning Sun , Zhiming Li

Curriculum-style Data Augmentation for LLM-based Metaphor Detection

Recently, utilizing large language models (LLMs) for metaphor detection has achieved promising results. However, these methods heavily rely on the capabilities of closed-source LLMs, which come with relatively high inference costs and…

Computation and Language · Computer Science 2025-03-04 Kaidi Jia , Yanxia Wu , Ming Liu , Rongsheng Li

GRR-CoCa: Leveraging LLM Mechanisms in Multimodal Model Architectures

State-of-the-art (SOTA) image and text generation models are multimodal models that have many similarities to large language models (LLMs). Despite achieving strong performances, leading foundational multimodal model architectures…

Computer Vision and Pattern Recognition · Computer Science 2025-07-25 Jake R. Patock , Nicole Catherine Lewis , Kevin McCoy , Christina Gomez , Canling Chen , Lorenzo Luzi

Unsupervised Document Embedding via Contrastive Augmentation

We present a contrasting learning approach with data augmentation techniques to learn document representations in an unsupervised manner. Inspired by recent contrastive self-supervised learning algorithms used for image and NLP pretraining,…

Computation and Language · Computer Science 2021-03-29 Dongsheng Luo , Wei Cheng , Jingchao Ni , Wenchao Yu , Xuchao Zhang , Bo Zong , Yanchi Liu , Zhengzhang Chen , Dongjin Song , Haifeng Chen , Xiang Zhang

Counterfactual Data Augmentation using Locally Factored Dynamics

Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the…

Machine Learning · Computer Science 2020-12-07 Silviu Pitis , Elliot Creager , Animesh Garg

CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation

Knowledge distillation (KD) is an efficient framework for compressing large-scale pre-trained language models. Recent years have seen a surge of research aiming to improve KD by leveraging Contrastive Learning, Intermediate Layer…

Computation and Language · Computer Science 2022-04-19 Md Akmal Haidar , Mehdi Rezagholizadeh , Abbas Ghaddar , Khalil Bibi , Philippe Langlais , Pascal Poupart

CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning

Unsupervised Domain Adaptation (UDA) aims to adapt models from labeled source domains to unlabeled target domains. When adapting to adverse scenes, existing UDA methods fail to perform well due to the lack of instructions, leading their…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Ziyang Gong , Fuhao Li , Yupeng Deng , Deblina Bhattacharjee , Xianzheng Ma , Xiangwei Zhu , Zhenming Ji

Adapting Multimodal Foundation Models for Few-Shot Learning: A Comprehensive Study on Contrastive Captioners

Large-scale multimodal foundation models, particularly Contrastive Captioners (CoCa), have achieved state-of-the-art results by unifying contrastive alignment with generative captioning. While zero-shot transfer capabilities are…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 N. K. B. M. P. K. B. Narasinghe , Uthayasanker Thayasivam