Computation and Language · Computer Science
Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision
Yixin Cao, Lei Hou, Juanzi Li, Zhiyuan Liu +3
2018-11-28
Computer Vision and Pattern Recognition · Computer Science
Vision-Language Pre-Training for Boosting Scene Text Detectors
Sibo Song, Jianqiang Wan, Zhibo Yang, Jun Tang +3
2022-05-02
Computer Vision and Pattern Recognition · Computer Science
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin, Kun Xu, Kun Xu, Liwei Chen +12
2024-03-25
Computer Vision and Pattern Recognition · Computer Science
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Michael S. Ryoo, AJ Piergiovanni, Anurag Arnab, Mostafa Dehghani +1
2022-04-05
Computation and Language · Computer Science
Universal Multimodal Representation for Language Understanding
Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama +3
2023-01-10
Computer Vision and Pattern Recognition · Computer Science
See, Hear, and Read: Deep Aligned Representations
Yusuf Aytar, Carl Vondrick, Antonio Torralba
2017-06-06
Computer Vision and Pattern Recognition · Computer Science
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas +2
2023-03-16
Computer Vision and Pattern Recognition · Computer Science
Vision as a Dialect: Unifying Visual Understanding and Generation via Text-Aligned Representations
Jiaming Han, Hao Chen, Yang Zhao, Hanyu Wang +5
2025-06-24
Computer Vision and Pattern Recognition · Computer Science
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
Zhuowan Li, Cihang Xie, Benjamin Van Durme, Alan Yuille
2024-01-31
Computer Vision and Pattern Recognition · Computer Science
Multi-modal Alignment using Representation Codebook
Jiali Duan, Liqun Chen, Son Tran, Jinyu Yang +3
2022-03-29
Image and Video Processing · Electrical Eng. & Systems
Learning Token-based Representation for Image Retrieval
Hui Wu, Min Wang, Wengang Zhou, Yang Hu +1
2021-12-14
Computer Vision and Pattern Recognition · Computer Science
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
Tanmay Gupta, Kevin Shih, Saurabh Singh, Derek Hoiem
2017-10-17
Computer Vision and Pattern Recognition · Computer Science
DualToken: Towards Unifying Visual Understanding and Generation with Dual Visual Vocabularies
Wei Song, Yuran Wang, Zijia Song, Yadong Li +5
2026-04-21
Computer Vision and Pattern Recognition · Computer Science
Augmenting Vision Language Pretraining by Learning Codebook with Visual Semantics
Xiaoyuan Guo, Jiali Duan, C. -C. Jay Kuo, Judy Wawira Gichoya +1
2022-08-02
Robotics · Computer Science
The Surprising Effectiveness of Representation Learning for Visual Imitation
Jyothish Pari, Nur Muhammad Shafiullah, Sridhar Pandian Arunachalam, Lerrel Pinto
2021-12-07
Computer Vision and Pattern Recognition · Computer Science
Unified Multimodal Understanding via Byte-Pair Visual Encoding
Wanpeng Zhang, Yicheng Feng, Hao Luo, Yijiang Li +3
2025-07-01
Computer Vision and Pattern Recognition · Computer Science
Multi-mapping Image-to-Image Translation via Learning Disentanglement
Xiaoming Yu, Yuanqi Chen, Thomas Li, Shan Liu +1
2019-12-30
Computer Vision and Pattern Recognition · Computer Science
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
Souhail Bakkali, Zuheng Ming, Mickael Coustaty, Marçal Rusiñol +1
2023-05-12
Computer Vision and Pattern Recognition · Computer Science
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
Peng Jin, Ryuichi Takanobu, Wancai Zhang, Xiaochun Cao +1
2024-04-08
Computer Vision and Pattern Recognition · Computer Science
Representation Calibration and Uncertainty Guidance for Class-Incremental Learning based on Vision Language Model
Jiantao Tan, Peixian Ma, Tong Yu, Wentao Zhang +1
2025-12-11