Related papers: SnapNCode: An Integrated Development Environment f…

Descrip3D: Enhancing Large Language Model-based 3D Scene Understanding with Object-Level Text Descriptions

Understanding 3D scenes goes beyond simply recognizing objects; it requires reasoning about the spatial and semantic relationships between them. Current 3D scene-language models often struggle with this relational understanding,…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Jintang Xue , Ganning Zhao , Jie-En Yao , Hong-En Chen , Yue Hu , Meida Chen , Suya You , C. -C. Jay Kuo

SceneCode: Executable World Programs for Editable Indoor Scenes with Articulated Objects

Indoor scene synthesis underpins embodied AI, robotic manipulation, and simulation-based policy evaluation, where a useful scene must specify not only what the environment looks like, but also how its objects are structured. Existing…

Artificial Intelligence · Computer Science 2026-05-20 Puyi Wang , Yuhao Wang , Linjie Li , Zhengyuan Yang , Kevin Qinghong Lin , Yangguang Li , Yu Cheng

SpatialTouch: Exploring Spatial Data Visualizations in Cross-reality

We propose and study a novel cross-reality environment that seamlessly integrates a monoscopic 2D surface (an interactive screen with touch and pen input) with a stereoscopic 3D space (an augmented reality HMD) to jointly host spatial data…

Human-Computer Interaction · Computer Science 2024-09-25 Lixiang Zhao , Tobias Isenberg , Fuqi Xie , Hai-Ning Liang , Lingyun Yu

Multimodal 3D Fusion and In-Situ Learning for Spatially Aware AI

Seamless integration of virtual and physical worlds in augmented reality benefits from the system semantically "understanding" the physical environment. AR research has long focused on the potential of context awareness, demonstrating novel…

Human-Computer Interaction · Computer Science 2024-10-08 Chengyuan Xu , Radha Kumaran , Noah Stier , Kangyou Yu , Tobias Höllerer

CAPE: Camera View Position Embedding for Multi-View 3D Object Detection

In this paper, we address the problem of detecting 3D objects from multi-view images. Current query-based methods rely on global 3D position embeddings (PE) to learn the geometric correspondence between images and 3D space. We claim that…

Computer Vision and Pattern Recognition · Computer Science 2023-03-21 Kaixin Xiong , Shi Gong , Xiaoqing Ye , Xiao Tan , Ji Wan , Errui Ding , Jingdong Wang , Xiang Bai

Motion-Appearance Interactive Encoding for Object Segmentation in Unconstrained Videos

We present a novel method of integrating motion and appearance cues for foreground object segmentation in unconstrained videos. Unlike conventional methods encoding motion and appearance patterns individually, our method puts particular…

Computer Vision and Pattern Recognition · Computer Science 2019-04-17 Chunchao Guo , Jianhuang Lai , Xiaohua Xie

GeoCode: Interpretable Shape Programs

The task of crafting procedural programs capable of generating structurally valid 3D shapes easily and intuitively remains an elusive goal in computer vision and graphics. Within the graphics community, generating procedural 3D models has…

Graphics · Computer Science 2025-03-21 Ofek Pearl , Itai Lang , Yuhua Hu , Raymond A. Yeh , Rana Hanocka

Playable Environments: Video Manipulation in Space and Time

We present Playable Environments - a new representation for interactive video generation and manipulation in space and time. With a single image at inference time, our novel framework allows the user to move objects in 3D while generating a…

Computer Vision and Pattern Recognition · Computer Science 2022-03-17 Willi Menapace , Stéphane Lathuilière , Aliaksandr Siarohin , Christian Theobalt , Sergey Tulyakov , Vladislav Golyanik , Elisa Ricci

AniCode: Authoring Coded Artifacts for Network-Free Personalized Animations

Time-based media (videos, synthetic animations, and virtual reality experiences) are used for communication, in applications such as manufacturers explaining the operation of a new appliance to consumers and scientists illustrating the…

Graphics · Computer Science 2019-05-17 Zeyu Wang , Shiyu Qiu , Qingyang Chen , Alexander Ringlein , Julie Dorsey , Holly Rushmeier

Thinking with Spatial Code for Physical-World Video Reasoning

We introduce Thinking with Spatial Code, a framework that transforms RGB video into explicit, temporally coherent 3D representations for physical-world visual question answering. We highlight the empirical finding that our proposed spatial…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Jieneng Chen , Wenxin Ma , Ruisheng Yuan , Yunzhi Zhang , Jiajun Wu , Alan Yuille

Spatial Computing and Intuitive Interaction: Bringing Mixed Reality and Robotics Together

Spatial computing -- the ability of devices to be aware of their surroundings and to represent this digitally -- offers novel capabilities in human-robot interaction. In particular, the combination of spatial computing and egocentric…

Robotics · Computer Science 2022-02-04 Jeffrey Delmerico , Roi Poranne , Federica Bogo , Helen Oleynikova , Eric Vollenweider , Stelian Coros , Juan Nieto , Marc Pollefeys

Exploiting Spatial Invariance for Scalable Unsupervised Object Tracking

The ability to detect and track objects in the visual world is a crucial skill for any intelligent agent, as it is a necessary precursor to any object-level reasoning process. Moreover, it is important that agents learn to track objects…

Machine Learning · Computer Science 2019-11-21 Eric Crawford , Joelle Pineau

Spatial Understanding from Videos: Structured Prompts Meet Simulation Data

Visual-spatial understanding, the ability to infer object relationships and layouts from visual input, is fundamental to downstream tasks such as robotic navigation and embodied interaction. However, existing methods face spatial…

Computer Vision and Pattern Recognition · Computer Science 2025-09-22 Haoyu Zhang , Meng Liu , Zaijing Li , Haokun Wen , Weili Guan , Yaowei Wang , Liqiang Nie

SemanticPaint: A Framework for the Interactive Segmentation of 3D Scenes

We present an open-source, real-time implementation of SemanticPaint, a system for geometric reconstruction, object-class segmentation and learning of 3D scenes. Using our system, a user can walk into a room wearing a depth camera and a…

Computer Vision and Pattern Recognition · Computer Science 2017-09-05 Stuart Golodetz , Michael Sapienza , Julien P. C. Valentin , Vibhav Vineet , Ming-Ming Cheng , Anurag Arnab , Victor A. Prisacariu , Olaf Kähler , Carl Yuheng Ren , David W. Murray , Shahram Izadi , Philip H. S. Torr

Embedding Spatial Software Visualization in the IDE: an Exploratory Study

Software visualization can be of great use for understanding and exploring a software system in an intuitive manner. Spatial representation of software is a promising approach of increasing interest. However, little is known about how…

Software Engineering · Computer Science 2010-07-27 Adrian Kuhn , David Erni , Oscar Nierstrasz

Diverse Semantic Image Editing with Style Codes

Semantic image editing requires inpainting pixels following a semantic map. It is a challenging task since this inpainting requires both harmony with the context and strict compliance with the semantic maps. The majority of the previous…

Computer Vision and Pattern Recognition · Computer Science 2023-09-26 Hakan Sivuk , Aysegul Dundar

MIMO: Controllable Character Video Synthesis with Spatial Decomposed Modeling

Character video synthesis aims to produce realistic videos of animatable characters within lifelike scenes. As a fundamental problem in the computer vision and graphics community, 3D works typically require multi-view captures for per-case…

Computer Vision and Pattern Recognition · Computer Science 2025-06-12 Yifang Men , Yuan Yao , Miaomiao Cui , Liefeng Bo

Synthesizing Diverse Human Motions in 3D Indoor Scenes

We present a novel method for populating 3D indoor scenes with virtual humans that can navigate in the environment and interact with objects in a realistic manner. Existing approaches rely on training sequences that contain captured human…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Kaifeng Zhao , Yan Zhang , Shaofei Wang , Thabo Beeler , Siyu Tang

Enabling Tangible Interaction through Detection and Augmentation of Everyday Objects

Digital interaction with everyday objects has become popular since the proliferation of camera-based systems that detect and augment objects "just-in-time". Common systems use a vision-based approach to detect objects and display their…

Human-Computer Interaction · Computer Science 2020-12-22 Thomas Kosch , Albrecht Schmidt

Spatial Computing: Concept, Applications, Challenges and Future Directions

Spatial computing is a technological advancement that facilitates the seamless integration of devices into the physical environment, resulting in a more natural and intuitive digital world user experience. Spatial computing has the…

Human-Computer Interaction · Computer Science 2024-02-14 Gokul Yenduri , Ramalingam M , Praveen Kumar Reddy Maddikunta , Thippa Reddy Gadekallu , Rutvij H Jhaveri , Ajay Bandi , Junxin Chen , Wei Wang , Adarsh Arunkumar Shirawalmath , Raghav Ravishankar , Weizheng Wang