Related papers: FloorplanVLM: A Vision-Language Model for Floorpla…

Unified Vector Floorplan Generation via Markup Representation

Automatic residential floorplan generation has long been a central challenge bridging architecture and computer graphics, aiming to make spatial design more efficient and accessible. While early methods based on constraint satisfaction or…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Kaede Shiohara , Toshihiko Yamasaki

TLC-Plan: A Two-Level Codebook Based Network for End-to-End Vector Floorplan Generation

Automated floorplan generation aims to improve design quality, architectural efficiency, and sustainability by jointly modeling global spatial organization and precise geometric detail. However, existing approaches operate in raster space…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Biao Xiong , Zhen Peng , Ping Wang , Qiegen Liu , Xian Zhong

Vision Language Models Can Parse Floor Plan Maps

Vision language models (VLMs) can simultaneously reason about images and texts to tackle many tasks, from visual question answering to image captioning. This paper focuses on map parsing, a novel task that is unexplored within the VLM…

Robotics · Computer Science 2025-11-26 David DeFazio , Hrudayangam Mehta , Meng Wang , Ping Yang , Jeremy Blackburn , Shiqi Zhang

Enhanced Object Detection in Floor-plan through Super Resolution

Building Information Modelling (BIM) software use scalable vector formats to enable flexible designing of floor plans in the industry. Floor plans in the architectural domain can come from many sources that may or may not be in scalable…

Computer Vision and Pattern Recognition · Computer Science 2021-12-21 Dev Khare , N S Kamal , Barathi Ganesh HB , V Sowmya , V V Sajith Variyar

End-to-end Graph-constrained Vectorized Floorplan Generation with Panoptic Refinement

The automatic generation of floorplans given user inputs has great potential in architectural design and has recently been explored in the computer vision community. However, the majority of existing methods synthesize floorplans in the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-28 Jiachen Liu , Yuan Xue , Jose Duarte , Krishnendra Shekhawat , Zihan Zhou , Xiaolei Huang

HypergraphFormer: Learning Hypergraphs from LLMs for Editable Floor Plan Generation

In this work, we propose HypergraphFormer, a novel and efficient approach to floor plan generation based on learning hypergraph representations with a large language model (LLM). The model is trained via supervised fine-tuning to generate a…

Machine Learning · Computer Science 2026-05-26 Nikita Klimenko , Hesam Salehipour , Parham Eftekhar , Amir Khasahmadi , Ramon Elias Weber

VectorArk: Learning Practical Image Vectorization with Rounded Polygon Representation

Recent vision-language model (VLM)-based approaches have achieved impressive results on image vectorization tasks. However, they are typically evaluated on synthetic benchmarks, where clean SVGs are rasterized at high resolution and then…

Computer Vision and Pattern Recognition · Computer Science 2026-05-26 Tarun Gehlaut , Difan Liu , Charu Bansal , Krutik Malani , Souymodip Chakraborty , Ankit Phogat , Matthew Fisher , Vineet Batra

FloorPlan-DeepSeek (FPDS): A multimodal approach to floorplan generation using vector-based next room prediction

In the architectural design process, floor plan generation is inherently progressive and iterative. However, existing generative models for floor plans are predominantly end-to-end generation that produce an entire pixel-based layout in a…

Computation and Language · Computer Science 2025-08-05 Jun Yin , Pengyu Zeng , Jing Zhong , Peilin Li , Miao Zhang , Ran Luo , Shuai Lu

Reading Images Like Texts: Sequential Image Understanding in Vision-Language Models

Vision-Language Models (VLMs) have demonstrated remarkable performance across a variety of real-world tasks. However, existing VLMs typically process visual information by serializing images, a method that diverges significantly from the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Yueyan Li , Chenggong Zhao , Zeyuan Zang , Caixia Yuan , Xiaojie Wang

PolyRoom: Room-aware Transformer for Floorplan Reconstruction

Reconstructing geometry and topology structures from raw unstructured data has always been an important research topic in indoor mapping research. In this paper, we aim to reconstruct the floorplan with a vectorized representation from…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Yuzhou Liu , Lingjie Zhu , Xiaodong Ma , Hanqiao Ye , Xiang Gao , Xianwei Zheng , Shuhan Shen

DStruct2Design: Data and Benchmarks for Data Structure Driven Generative Floor Plan Design

Text conditioned generative models for images have yielded impressive results. Text conditioned floorplan generation as a special type of raster image generation task also received particular attention. However there are many use cases in…

Computation and Language · Computer Science 2024-07-23 Zhi Hao Luo , Luis Lara , Ge Ya Luo , Florian Golemo , Christopher Beckham , Christopher Pal

FloorPlan-VLN: A New Paradigm for Floor Plan Guided Vision-Language Navigation

Existing Vision-Language Navigation (VLN) task requires agents to follow verbose instructions, ignoring some potentially useful global spatial priors, limiting their capability to reason about spatial structures. Although human-readable…

Robotics · Computer Science 2026-03-19 Kehan Chen , Yan Huang , Dong An , Jiawei He , Yifei Su , Jing Liu , Nianfeng Liu , Liang Wang

VectorLLM: Human-like Extraction of Structured Building Contours vis Multimodal LLMs

Automatically extracting vectorized building contours from remote sensing imagery is crucial for urban planning, population estimation, and disaster assessment. Current state-of-the-art methods rely on complex multi-stage pipelines…

Computer Vision and Pattern Recognition · Computer Science 2025-07-08 Tao Zhang , Shiqing Wei , Shihao Chen , Wenling Yu , Muying Luo , Shunping Ji

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Spatial reasoning is a fundamental aspect of human cognition, enabling intuitive understanding and manipulation of objects in three-dimensional space. While foundation models demonstrate remarkable performance on some benchmarks, they still…

Computer Vision and Pattern Recognition · Computer Science 2025-03-12 Fan-Yun Sun , Weiyu Liu , Siyi Gu , Dylan Lim , Goutam Bhat , Federico Tombari , Manling Li , Nick Haber , Jiajun Wu

Perceiving, Reasoning, Adapting: A Dual-Layer Framework for VLM-Guided Precision Robotic Manipulation

Vision-Language Models (VLMs) demonstrate remarkable potential in robotic manipulation, yet challenges persist in executing complex fine manipulation tasks with high speed and precision. While excelling at high-level planning, existing VLM…

Robotics · Computer Science 2025-03-10 Qingxuan Jia , Guoqin Tang , Zeyuan Huang , Zixuan Hao , Ning Ji , Shihang , Yin , Gang Chen

RePLan: Robotic Replanning with Perception and Language Models

Advancements in large language models (LLMs) have demonstrated their potential in facilitating high-level reasoning, logical reasoning and robotics planning. Recently, LLMs have also been able to generate reward functions for low-level…

Robotics · Computer Science 2024-02-21 Marta Skreta , Zihan Zhou , Jia Lin Yuan , Kourosh Darvish , Alán Aspuru-Guzik , Animesh Garg

ViPlan: A Benchmark for Visual Planning with Symbolic Predicates and Vision-Language Models

Integrating Large Language Models with symbolic planners is a promising direction for obtaining verifiable and grounded plans, with recent work extending this idea to visual domains using Vision-Language Models (VLMs). However, a rigorous…

Artificial Intelligence · Computer Science 2026-03-04 Matteo Merler , Nicola Dainese , Minttu Alakuijala , Giovanni Bonetta , Pietro Ferrazzi , Yu Tian , Bernardo Magnini , Pekka Marttinen

Video Language Planning

We are interested in enabling visual planning for complex long-horizon tasks in the space of generated videos and language, leveraging recent advances in large generative models pretrained on Internet-scale data. To this end, we present…

Computer Vision and Pattern Recognition · Computer Science 2023-10-17 Yilun Du , Mengjiao Yang , Pete Florence , Fei Xia , Ayzaan Wahid , Brian Ichter , Pierre Sermanet , Tianhe Yu , Pieter Abbeel , Joshua B. Tenenbaum , Leslie Kaelbling , Andy Zeng , Jonathan Tompson

VLMPlanner: Integrating Visual Language Models with Motion Planning

Integrating large language models (LLMs) into autonomous driving motion planning has recently emerged as a promising direction, offering enhanced interpretability, better controllability, and improved generalization in rare and long-tail…

Artificial Intelligence · Computer Science 2025-07-29 Zhipeng Tang , Sha Zhang , Jiajun Deng , Chenjie Wang , Guoliang You , Yuting Huang , Xinrui Lin , Yanyong Zhang

Guiding Long-Horizon Task and Motion Planning with Vision Language Models

Vision-Language Models (VLM) can generate plausible high-level plans when prompted with a goal, the context, an image of the scene, and any planning constraints. However, there is no guarantee that the predicted actions are geometrically…

Robotics · Computer Science 2024-10-04 Zhutian Yang , Caelan Garrett , Dieter Fox , Tomás Lozano-Pérez , Leslie Pack Kaelbling