Graphics · Computer Science
Volumetric Procedural Models for Shape Representation
Andrew Willis, Prashant Ganesh, Kyle Volle, Jincheng Zhang +1
2021-03-23
Computation and Language · Computer Science
Visually-Augmented Language Modeling
Weizhi Wang, Li Dong, Hao Cheng, Haoyu Song +4
2023-02-28
Machine Learning · Computer Science
An Introduction to Vision-Language Modeling
Florian Bordes, Richard Yuanzhe Pang, Anurag Ajay, Alexander C. Li +37
2024-05-28
Computer Vision and Pattern Recognition · Computer Science
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li, Zhixin Lai, Wentao Bao, Zhen Tan +6
2025-01-07
Computer Vision and Pattern Recognition · Computer Science
Vision language models are unreliable at trivial spatial cognition
Sangeet Khemlani, Tyler Tran, Nathaniel Gyory, Anthony M. Harrison +5
2025-04-23
Software Engineering · Computer Science
Towards a Formalization of the Unified Modeling Language
Ruth Breu, Ursula Hinkel, Christoph Hofmann, Cornel Klein +3
2014-09-26
Robotics · Computer Science
A3VLM: Actionable Articulation-Aware Vision Language Model
Siyuan Huang, Haonan Chang, Yuhan Liu, Yimeng Zhu +4
2024-06-14
Computer Vision and Pattern Recognition · Computer Science
A Multimodal Recaptioning Framework to Account for Perceptual Diversity Across Languages in Vision-Language Modeling
Kyle Buettner, Jacob T. Emmerson, Adriana Kovashka
2025-11-12
Computer Vision and Pattern Recognition · Computer Science
LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models
Fan-Yun Sun, Weiyu Liu, Siyi Gu, Dylan Lim +5
2025-03-12
Computer Vision and Pattern Recognition · Computer Science
How Can Objects Help Video-Language Understanding?
Zitian Tang, Shijie Wang, Junho Cho, Jaewook Yoo +1
2025-08-06
Robotics · Computer Science
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang, Chen Wang, Ruohan Zhang, Yunzhu Li +2
2023-11-03
Audio and Speech Processing · Electrical Eng. & Systems
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiushi Zhu, Long Zhou, Ziqiang Zhang, Shujie Liu +6
2023-05-22
Computer Vision and Pattern Recognition · Computer Science
VividMed: Vision Language Model with Versatile Visual Grounding for Medicine
Lingxiao Luo, Bingda Tang, Xuanzhong Chen, Rong Han +1
2025-02-19
Artificial Intelligence · Computer Science
Grounding Visual Illusions in Language: Do Vision-Language Models Perceive Illusions Like Humans?
Yichi Zhang, Jiayi Pan, Yuchen Zhou, Rui Pan +1
2023-11-02
Computation and Language · Computer Science
Do Vision-Language Models Really Understand Visual Language?
Yifan Hou, Buse Giledereli, Yilei Tu, Mrinmaya Sachan
2025-05-27
Robotics · Computer Science
Task-oriented Robotic Manipulation with Vision Language Models
Nurhan Bulus Guran, Hanchi Ren, Jingjing Deng, Xianghua Xie
2025-05-21
Computer Vision and Pattern Recognition · Computer Science
Do Pre-trained Vision-Language Models Encode Object States?
Kaleb Newman, Shijie Wang, Yuan Zang, David Heffren +1
2024-09-17
Computer Vision and Pattern Recognition · Computer Science
VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors
Haz Sameen Shahgir, Xiaofu Chen, Yu Fu, Erfan Shayegani +3
2026-04-16
Computer Vision and Pattern Recognition · Computer Science
VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation
Maximilian Rokuss, Moritz Langenberg, Yannick Kirchhoff, Fabian Isensee +7
2025-11-17