English
Related papers

Related papers: VoxML: A Visualization Modeling Language

200 papers

VoxML is a modeling language used to map natural language expressions into real-time visualizations using commonsense semantic knowledge of objects and events. Its utility has been demonstrated in embodied simulation environments and in…

Computation and Language · Computer Science 2023-05-23 Kiyong Lee , Nikhil Krishnaswamy , James Pustejovsky

Comprehending 3D environments is vital for intelligent systems in domains like robotics and autonomous navigation. Voxel grids offer a structured representation of 3D space, but extracting high-level semantic meaning remains challenging.…

Computer Vision and Pattern Recognition · Computer Science 2025-12-03 Alan Dao , Norapat Buppodom

Vision language models (VLMs) are AI systems paired with both language and vision encoders to process multimodal input. They are capable of performing complex semantic tasks such as automatic captioning, but it remains an open question…

Computer Vision and Pattern Recognition · Computer Science 2025-05-16 Tyler Tran , Sangeet Khemlani , J. G. Trafton

This article describes a volumetric approach for procedural shape modeling and a new Procedural Shape Modeling Language (PSML) that facilitates the specification of these models. PSML provides programmers the ability to describe shapes in…

Graphics · Computer Science 2021-03-23 Andrew Willis , Prashant Ganesh , Kyle Volle , Jincheng Zhang , Kevin Brink

Human language is grounded on multimodal knowledge including visual knowledge like colors, sizes, and shapes. However, current large-scale pre-trained language models rely on text-only self-supervised training with massive text data, which…

Computation and Language · Computer Science 2023-02-28 Weizhi Wang , Li Dong , Hao Cheng , Haoyu Song , Xiaodong Liu , Xifeng Yan , Jianfeng Gao , Furu Wei

Following the recent popularity of Large Language Models (LLMs), several attempts have been made to extend them to the visual domain. From having a visual assistant that could guide us through unfamiliar environments to generative models…

Visual-language models (VLM) have emerged as a powerful tool for learning a unified embedding space for vision and language. Inspired by large language models, which have demonstrated strong reasoning and multi-task capabilities, visual…

Computer Vision and Pattern Recognition · Computer Science 2025-01-07 Yifan Li , Zhixin Lai , Wentao Bao , Zhen Tan , Anh Dao , Kewei Sui , Jiayi Shen , Dong Liu , Huan Liu , Yu Kong

Visual grounding refers to the ability of a model to identify a region within some visual input that matches a textual description. Consequently, a model equipped with visual grounding capabilities can target a wide range of applications in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-16 Georgios Pantazopoulos , Eda B. Özyiğit

Vision language models (VLMs) are designed to extract relevant visuospatial information from images. Some research suggests that VLMs can exhibit humanlike scene understanding, while other investigations reveal difficulties in their ability…

Computer Vision and Pattern Recognition · Computer Science 2025-04-23 Sangeet Khemlani , Tyler Tran , Nathaniel Gyory , Anthony M. Harrison , Wallace E. Lawson , Ravenna Thielstrom , Hunter Thompson , Taaren Singh , J. Gregory Trafton

The Unified Modeling Language UML is a language for specifying visualizing and documenting object oriented systems UML combines the concepts of OOA OODOMT and OOSE and is intended as a standard in the domain of object oriented analysis and…

Software Engineering · Computer Science 2014-09-26 Ruth Breu , Ursula Hinkel , Christoph Hofmann , Cornel Klein , Barbara Paech , Bernhard Rumpe , V. Thurner

Vision Language Models (VLMs) have received significant attention in recent years in the robotics community. VLMs are shown to be able to perform complex visual reasoning and scene understanding tasks, which makes them regarded as a…

Robotics · Computer Science 2024-06-14 Siyuan Huang , Haonan Chang , Yuhan Liu , Yimeng Zhu , Hao Dong , Peng Gao , Abdeslam Boularias , Hongsheng Li

In this paper, we describe a system for generating three-dimensional visual simulations of natural language motion expressions. We use a rich formal model of events and their participants to generate simulations that satisfy the minimal…

Computation and Language · Computer Science 2016-10-04 Nikhil Krishnaswamy , James Pustejovsky

The ability to construct mental models of the world is a central aspect of understanding. Similarly, visual understanding can be viewed as the ability to construct a representative model of the system depicted in an image. This work…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Sagi Eppel

When captioning an image, people describe objects in diverse ways, such as by using different terms and/or including details that are perceptually noteworthy to them. Descriptions can be especially unique across languages and cultures.…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Kyle Buettner , Jacob T. Emmerson , Adriana Kovashka

Spatial reasoning is a fundamental aspect of human cognition, enabling intuitive understanding and manipulation of objects in three-dimensional space. While foundation models demonstrate remarkable performance on some benchmarks, they still…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Fan-Yun Sun , Weiyu Liu , Siyi Gu , Dylan Lim , Goutam Bhat , Federico Tombari , Manling Li , Nick Haber , Jiajun Wu

Do we still need to represent objects explicitly in multimodal large language models (MLLMs)? To one extreme, pre-trained encoders convert images into visual tokens, with which objects and spatiotemporal relationships may be implicitly…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Zitian Tang , Shijie Wang , Junho Cho , Jaewook Yoo , Chen Sun

Large language models (LLMs) are shown to possess a wealth of actionable knowledge that can be extracted for robot manipulation in the form of reasoning and planning. Despite the progress, most still rely on pre-defined motion primitives to…

Robotics · Computer Science 2023-11-03 Wenlong Huang , Chen Wang , Ruohan Zhang , Yunzhu Li , Jiajun Wu , Li Fei-Fei

Although speech is a simple and effective way for humans to communicate with the outside world, a more realistic speech interaction contains multimodal information, e.g., vision, text. How to design a unified framework to integrate…

Audio and Speech Processing · Electrical Eng. & Systems 2023-05-22 Qiushi Zhu , Long Zhou , Ziqiang Zhang , Shujie Liu , Binxing Jiao , Jie Zhang , Lirong Dai , Daxin Jiang , Jinyu Li , Furu Wei

Recent advancements in Vision Language Models (VLMs) have demonstrated remarkable promise in generating visually grounded responses. However, their application in the medical domain is hindered by unique challenges. For instance, most VLMs…

Computer Vision and Pattern Recognition · Computer Science 2025-02-19 Lingxiao Luo , Bingda Tang , Xuanzhong Chen , Rong Han , Ting Chen

Vision-Language Models (VLMs) are trained on vast amounts of data captured by humans emulating our understanding of the world. However, known as visual illusions, human's perception of reality isn't always faithful to the physical world.…

Artificial Intelligence · Computer Science 2023-11-02 Yichi Zhang , Jiayi Pan , Yuchen Zhou , Rui Pan , Joyce Chai
‹ Prev 1 2 3 10 Next ›