English
Related papers

Related papers: SceneMotifCoder: Example-driven Visual Program Lea…

200 papers

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yilei Jiang , Yaozhi Zheng , Yuxuan Wan , Jiaming Han , Qunzhong Wang , Michael R. Lyu , Xiangyu Yue

3D multi object generative models allow us to synthesize a large range of novel 3D multi object scenes and also identify objects, shapes, layouts and their positions. But multi object scenes are difficult to create because of the dataset…

Computer Vision and Pattern Recognition · Computer Science 2019-03-11 Vedant Singh , Manan Oza , Himanshu Vaghela , Pratik Kanani

Modern machine learning models for scene understanding, such as depth estimation and object tracking, rely on large, high-quality datasets that mimic real-world deployment scenarios. To address data scarcity, we propose an end-to-end system…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Sonia Laguna , Alberto Garcia-Garcia , Marie-Julie Rakotosaona , Stylianos Moschoglou , Leonhard Helminger , Sergio Orts-Escolano

This position paper argues for the use of \emph{structured generative models} (SGMs) for the understanding of static scenes. This requires the reconstruction of a 3D scene from an input image (or a set of multi-view images), whereby the…

Computer Vision and Pattern Recognition · Computer Science 2024-12-16 Christopher K. I. Williams

A proper scene representation is central to the pursuit of spatial intelligence where agents can robustly reconstruct and efficiently understand 3D scenes. A scene representation is either metric, such as landmark maps in 3D reconstruction,…

Computer Vision and Pattern Recognition · Computer Science 2024-11-21 Juexiao Zhang , Gao Zhu , Sihang Li , Xinhao Liu , Haorui Song , Xinran Tang , Chen Feng

Synthesizing interactive 3D scenes from text is essential for gaming, virtual reality, and embodied AI. However, existing methods face several challenges. Learning-based approaches depend on small-scale indoor datasets, limiting the scene…

Computer Vision and Pattern Recognition · Computer Science 2025-05-06 Lu Ling , Chen-Hsuan Lin , Tsung-Yi Lin , Yifan Ding , Yu Zeng , Yichen Sheng , Yunhao Ge , Ming-Yu Liu , Aniket Bera , Zhaoshuo Li

In this work, we study the problem of generating novel images from complex multimodal prompt sequences. While existing methods achieve promising results for text-to-image generation, they often struggle to capture fine-grained details from…

Computer Vision and Pattern Recognition · Computer Science 2024-05-29 Amandeep Kumar , Muzammal Naseer , Sanath Narayan , Rao Muhammad Anwer , Salman Khan , Hisham Cholakkal

Recent Multi-Modal Large Language Models (MLLMs) have demonstrated strong capabilities in learning joint representations from text and images. However, their spatial reasoning remains limited. We introduce 3DFroMLLM, a novel framework that…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Noor Ahmed , Cameron Braunstein , Steffen Eger , Eddy Ilg

We are witnessing significant breakthroughs in the technology for generating 3D objects from text. Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric…

Computer Vision and Pattern Recognition · Computer Science 2023-12-15 Qihang Zhang , Chaoyang Wang , Aliaksandr Siarohin , Peiye Zhuang , Yinghao Xu , Ceyuan Yang , Dahua Lin , Bolei Zhou , Sergey Tulyakov , Hsin-Ying Lee

3D visual grounding aims to localize the unique target described by natural languages in 3D scenes. The significant gap between 3D and language modalities makes it a notable challenge to distinguish multiple similar objects through the…

Computer Vision and Pattern Recognition · Computer Science 2025-08-18 Feng Xiao , Hongbin Xu , Guocan Zhao , Wenxiong Kang

We present MMCORE, a unified framework designed for multimodal image generation and editing. MMCORE leverages a pre-trained Vision-Language Model (VLM) to predict semantic visual embeddings via learnable query tokens, which subsequently…

Computer Vision and Pattern Recognition · Computer Science 2026-04-23 Zijie Li , Yichun Shi , Jingxiang Sun , Ye Wang , Yixuan Huang , Zhiyao Guo , Xiaochen Lian , Peihao Zhu , Yu Tian , Zhonghua Zhai , Peng Wang

We introduce SceneLinker, a novel framework that generates compositional 3D scenes via semantic scene graph from RGB sequences. To adaptively experience Mixed Reality (MR) content based on each user's space, it is essential to generate a 3D…

Computer Vision and Pattern Recognition · Computer Science 2026-02-04 Seok-Young Kim , Dooyoung Kim , Woojin Cho , Hail Song , Suji Kang , Woontack Woo

Scene generation with 3D assets presents a complex challenge, requiring both high-level semantic understanding and low-level geometric reasoning. While Multimodal Large Language Models (MLLMs) excel at semantic tasks, their application to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-10 Ian Huang , Yanan Bao , Karen Truong , Howard Zhou , Cordelia Schmid , Leonidas Guibas , Alireza Fathi

We present a system for generating indoor scenes in response to text prompts. The prompts are not limited to a fixed vocabulary of scene descriptions, and the objects in generated scenes are not restricted to a fixed set of object…

Computer Vision and Pattern Recognition · Computer Science 2024-03-18 Rio Aguina-Kang , Maxim Gumin , Do Heon Han , Stewart Morris , Seung Jean Yoo , Aditya Ganeshan , R. Kenny Jones , Qiuhong Anna Wei , Kailiang Fu , Daniel Ritchie

Recent advancements in object-centric text-to-3D generation have shown impressive results. However, generating complex 3D scenes remains an open challenge due to the intricate relations between objects. Moreover, existing methods are…

Computer Vision and Pattern Recognition · Computer Science 2024-12-31 Yu-Hsiang Huang , Wei Wang , Sheng-Yu Huang , Yu-Chiang Frank Wang

Recent advances in text-to-image (T2I) generation have enabled visually coherent image synthesis from descriptions, but generating images containing multiple given subjects remains challenging. As the number of reference identities…

Machine Learning · Computer Science 2026-04-10 Yucheng Zhou , Dubing Chen , Huan Zheng , Jianbing Shen

Multimodal large language models (MLLMs) have significantly advanced the integration of visual and textual understanding. However, their ability to generate code from multimodal inputs remains limited. In this work, we introduce VisCodex, a…

Computation and Language · Computer Science 2025-08-14 Lingjie Jiang , Shaohan Huang , Xun Wu , Yixia Li , Dongdong Zhang , Furu Wei

Recent text-to-image models have revolutionized image generation, but they still struggle with maintaining concept consistency across generated images. While existing works focus on character consistency, they often overlook the crucial…

Computer Vision and Pattern Recognition · Computer Science 2025-10-28 Quanjian Song , Donghao Zhou , Jingyu Lin , Fei Shen , Jiaze Wang , Xiaowei Hu , Cunjian Chen , Pheng-Ann Heng

The rapid advancement of Large Language Models (LLMs) has significantly improved code generation, yet most models remain text-only, neglecting crucial visual aids like diagrams and flowcharts used in real-world software development. To…

Computation and Language · Computer Science 2025-07-14 Linzheng Chai , Jian Yang , Shukai Liu , Wei Zhang , Liran Wang , Ke Jin , Tao Sun , Congnan Liu , Chenchen Zhang , Hualei Zhu , Jiaheng Liu , Xianjie Wu , Ge Zhang , Tianyu Liu , Zhoujun Li

In this work, we introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts, trained using single-view images. Different from most existing 3D GANs that limit their…

Computer Vision and Pattern Recognition · Computer Science 2023-09-12 Sherwin Bahmani , Jeong Joon Park , Despoina Paschalidou , Xingguang Yan , Gordon Wetzstein , Leonidas Guibas , Andrea Tagliasacchi
‹ Prev 1 2 3 10 Next ›