Computer Vision and Pattern Recognition · Computer Science
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
Xiaoze Jiang, Jing Yu, Zengchang Qin, Yingying Zhuang +3
2019-11-19
Computer Vision and Pattern Recognition · Computer Science
Weaving Context Across Images: Improving Vision-Language Models through Focus-Centric Visual Chains
Juntian Zhang, Chuanqi cheng, Yuhan Liu, Wei Liu +2
2025-04-30
Computer Vision and Pattern Recognition · Computer Science
Veagle: Advancements in Multimodal Representation Learning
Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha +5
2024-10-29
Computer Vision and Pattern Recognition · Computer Science
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
Wenliang Zhong, Wenyi Wu, Qi Li, Rob Barton +5
2024-06-06
Machine Learning · Computer Science
Multi-View Clustering from the Perspective of Mutual Information
Fu Lele, Zhang Lei, Wang Tong, Chen Chuan +2
2023-05-31
Computation and Language · Computer Science
Semi-supervised Visual Feature Integration for Pre-trained Language Models
Lisai Zhang, Qingcai Chen, Dongfang Li, Buzhou Tang
2020-08-14
Computer Vision and Pattern Recognition · Computer Science
Towards More Unified In-context Visual Understanding
Dianmo Sheng, Dongdong Chen, Zhentao Tan, Qiankun Liu +6
2024-03-19
Computer Vision and Pattern Recognition · Computer Science
More Images, More Problems? A Controlled Analysis of VLM Failure Modes
Anurag Das, Adrian Bulat, Alberto Baldrati, Ioannis Maniadis Metaxas +3
2026-01-13
Computer Vision and Pattern Recognition · Computer Science
Parts of Speech-Grounded Subspaces in Vision-Language Models
James Oldfield, Christos Tzelepis, Yannis Panagakis, Mihalis A. Nicolaou +1
2023-11-14
Computer Vision and Pattern Recognition · Computer Science
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
Kyle Buettner, Jacob T. Emmerson, Adriana Kovashka
2025-11-12
Artificial Intelligence · Computer Science
Hierarchical Mutual Information Analysis: Towards Multi-view Clustering in The Wild
Jiatai Wang, Zhiwei Xu, Xuewen Yang, Xin Wang
2023-10-31
Computer Vision and Pattern Recognition · Computer Science
Understanding ME? Multimodal Evaluation for Fine-grained Visual Commonsense
Zhecan Wang, Haoxuan You, Yicheng He, Wenhao Li +2
2023-10-24
Computer Vision and Pattern Recognition · Computer Science
A Survey on Efficient Vision-Language Models
Gaurav Shinde, Anuradha Ravi, Emon Dey, Shadman Sakib +2
2025-07-03
Computer Vision and Pattern Recognition · Computer Science
MVPTR: Multi-Level Semantic Alignment for Vision-Language Pre-Training via Multi-Stage Learning
Zejun Li, Zhihao Fan, Huaixiao Tou, Jingjing Chen +2
2022-09-15
Computation and Language · Computer Science
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
Haozhe Zhao, Zefan Cai, Shuzheng Si, Xiaojian Ma +6
2024-03-21
Computer Vision and Pattern Recognition · Computer Science
Advancing Visual Large Language Model for Multi-granular Versatile Perception
Wentao Xiang, Haoxian Tan, Cong Wei, Yujie Zhong +2
2025-07-23
Computation and Language · Computer Science
Learning Multi-Modal Word Representation Grounded in Visual Context
Éloi Zablocki, Benjamin Piwowarski, Laure Soulier, Patrick Gallinari
2017-11-10
Computer Vision and Pattern Recognition · Computer Science
EVLM: An Efficient Vision-Language Model for Visual Understanding
Kaibing Chen, Dong Shen, Hanwen Zhong, Huasong Zhong +13
2024-07-22
Computer Vision and Pattern Recognition · Computer Science
VinVL: Revisiting Visual Representations in Vision-Language Models
Pengchuan Zhang, Xiujun Li, Xiaowei Hu, Jianwei Yang +4
2021-03-11
Computer Vision and Pattern Recognition · Computer Science
Towards Multimodal In-Context Learning for Vision & Language Models
Sivan Doveh, Shaked Perek, M. Jehanzeb Mirza, Wei Lin +4
2024-07-18
Computer Vision and Pattern Recognition · Computer Science
Enhancing Visual In-Context Learning by Multi-Faceted Fusion
Wenwen Liao, Jianbo Yu, Yuansong Wang, Qingchao Jiang +1
2026-01-16
Artificial Intelligence · Computer Science
Multi-View Attention Network for Visual Dialog
Sungjin Park, Taesun Whang, Yeochan Yoon, Heuiseok Lim
2020-10-08
Computer Vision and Pattern Recognition · Computer Science
Using Multiple Instance Learning to Build Multimodal Representations
Peiqi Wang, William M. Wells, Seth Berkowitz, Steven Horng +1
2023-06-14
Computer Vision and Pattern Recognition · Computer Science
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Ying Nie, Wei He, Kai Han, Yehui Tang +3
2023-12-04