Computation and Language · Computer Science
Multimodal Large Language Models and Tunings: Vision, Language, Sensors, Audio, and Beyond
Soyeon Caren Han, Feiqi Cao, Josiah Poon, Roberto Navigli
2024-10-10
Computer Vision and Pattern Recognition · Computer Science
GPT4Video: A Unified Multimodal Large Language Model for lnstruction-Followed Understanding and Safety-Aware Generation
Zhanyu Wang, Longyue Wang, Zhen Zhao, Minghao Wu +6
2024-10-29
Computer Vision and Pattern Recognition · Computer Science
GPT4Tools: Teaching Large Language Model to Use Tools via Self-instruction
Rui Yang, Lin Song, Yanwei Li, Sijie Zhao +3
2023-05-31
Computer Vision and Pattern Recognition · Computer Science
Visual Instruction Tuning
Haotian Liu, Chunyuan Li, Qingyang Wu, Yong Jae Lee
2023-12-14
Computer Vision and Pattern Recognition · Computer Science
MultiModal-GPT: A Vision and Language Model for Dialogue with Humans
Tao Gong, Chengqi Lyu, Shilong Zhang, Yudong Wang +6
2023-06-14
Computer Vision and Pattern Recognition · Computer Science
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Xiao Wang, Guangyao Chen, Guangwu Qian, Pengcheng Gao +4
2024-04-11
Computer Vision and Pattern Recognition · Computer Science
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Zhengyuan Yang, Linjie Li, Kevin Lin, Jianfeng Wang +3
2023-10-12
Computer Vision and Pattern Recognition · Computer Science
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman +3
2024-04-05
Computation and Language · Computer Science
Lost in Translation: When GPT-4V(ision) Can't See Eye to Eye with Text. A Vision-Language-Consistency Analysis of VLLMs and Beyond
Xiang Zhang, Senyu Li, Zijun Wu, Ning Shi
2023-10-20
Computer Vision and Pattern Recognition · Computer Science
What Matters in Training a GPT4-Style Language Model with Multimodal Inputs?
Yan Zeng, Hanbo Zhang, Jiani Zheng, Jiangnan Xia +4
2023-08-01
Computer Vision and Pattern Recognition · Computer Science
Multimodal Foundation Models: From Specialists to General-Purpose Assistants
Chunyuan Li, Zhe Gan, Zhengyuan Yang, Jianwei Yang +3
2023-09-20
Robotics · Computer Science
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Jiaqi Wang, Zihao Wu, Yiwei Li, Hanqi Jiang +16
2024-01-10
Computer Vision and Pattern Recognition · Computer Science
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li +1
2023-10-03
Computer Vision and Pattern Recognition · Computer Science
A Survey on Multimodal Large Language Models
Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li +3
2024-12-02
Computer Vision and Pattern Recognition · Computer Science
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni, Federico Cocchi, Luca Barsellotti, Nicholas Moratelli +5
2024-06-07
Computer Vision and Pattern Recognition · Computer Science
Multimodal ChatGPT for Medical Applications: an Experimental Study of GPT-4V
Zhiling Yan, Kai Zhang, Rong Zhou, Lifang He +2
2023-10-31
Information Retrieval · Computer Science
Exploring Recommendation Capabilities of GPT-4V(ision): A Preliminary Case Study
Peilin Zhou, Meng Cao, You-Liang Huang, Qichen Ye +5
2023-11-08
Computation and Language · Computer Science
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti, Ian Stewart, Sameera Horawalavithana, Henry Kvinge +3
2024-06-11
Computation and Language · Computer Science
Summary of ChatGPT-Related Research and Perspective Towards the Future of Large Language Models
Yiheng Liu, Tianle Han, Siyuan Ma, Jiayue Zhang +14
2023-08-25
Computation and Language · Computer Science
X-LLaVA: Optimizing Bilingual Large Vision-Language Alignment
Dongjae Shin, Hyeonseok Lim, Inho Won, Changsu Choi +5
2024-08-05
Machine Learning · Computer Science
MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets
Lai Wei, Xiaozhe Li, Zihao Jiang, Weiran Huang +1
2026-04-14