GMOS: Grounding Moving Object Segmentation in 3D Space and Time
Computer Vision · 2026-05 · arXiv:2605.30352
Junyu Xie, Tengda Han, Weidi Xie, Andrew Zisserman
VideoMLA: Low-Rank Latent KV Cache for Minute-Scale Autoregressive Video Diffusion
Computer Vision · 2026-05 · arXiv:2605.30351
Hidir Yesiltepe, Jiazhen Hu, Tuna Han Salih Meral, Adil Kaan Akan +3
NeuROK: Generative 4D Neural Object Kinematics
Computer Vision · 2026-05 · arXiv:2605.30347
Chen Geng, Guangzhao He, Yue Gao, Yunzhi Zhang +2
YoCausal: How Far is Video Generation from World Model? A Causality Perspective
Computer Vision · 2026-05 · arXiv:2605.30346
You-Zhe Xie, Yu-Hsuan Li, Jie-Ying Lee, Kaipeng Zhang +2
Uncertainty-driven 3D Gaussian Splatting Active Mapping via Anisotropic Visibility Field
Computer Vision · 2026-05 · arXiv:2605.30342
Shangjie Xue, Jesse Dill, Dhruv Ahuja, Frank Dellaert +2
GPIC: A Giant Permissive Image Corpus for Visual Generation
Computer Vision · 2026-05 · arXiv:2605.30341
Keshigeyan Chandrasegaran, Kyle Sargent, Suchir Agarwal, Michael Jang +5
Benchmarking Single-Factor Physical Video-to-Audio Generation
Computer Vision · 2026-05 · arXiv:2605.30339
Tingle Li, Siddharth Gururani, Kevin J. Shih, Gantavya Bhatt +5
REST3D: Reconstructing Physically Stable 3D Scenes from a Single Image
Computer Vision · 2026-05 · arXiv:2605.30338
Xiaoxuan Ma, Jiashun Wang, Nicolas Ugrinovic, Yehonathan Litman +1
Supercharging Thermal Gaussian Splatting with Depth Estimation
Computer Vision · 2026-05 · arXiv:2605.30328
Manoj Biswanath, Chenxin Cai, Hannah Schieber, Daniel Roth +1
Veda: Scalable Video Diffusion via Distilled Sparse Attention
Computer Vision · 2026-05 · arXiv:2605.30325
Shihao Han, Hao Yang, Xinting Hu, Xiaofeng Mei +2
MonoPhysics: Estimating Geometry, Appearance, and Physical Parameters from Monocular Videos
Computer Vision · 2026-05 · arXiv:2605.30320
Daniel Rho, Jun Myeong Choi, Matthew Thornton, Biswadip Dey +1
VPG: Visual Prefix Guidance for Autoregressive Image and Video Generation
Computer Vision · 2026-05 · arXiv:2605.30317
Xinyao Liao, Qiyuan He, Yicong Li, Jiayin Zhu +3
Archon: A Unified Multimodal Model for Holistic Digital Human Generation
Computer Vision · 2026-05 · arXiv:2605.30311
Chong Bao, Shichen Liu, Lijun Yu, David Futschik +8
City-Mesh3R: Simulation-Ready City-Scale 3D Mesh Reconstruction from Multi-View Images
Computer Vision · 2026-05 · arXiv:2605.30310
Sayan Paul, Sourav Ghosh, Siddharth Katageri, Soumyadip Maity +2
Grounded 3D-Aware Spatial Vision-Language Modeling
Computer Vision · 2026-05 · arXiv:2605.30307
An-Chieh Cheng, Yang Fu, Yatai Ji, Ligeng Zhu +11
Boosting Image Quality Assessment Performance: Unsupervised Score Fusion by Deep Maximum a Posteriori Estimation
Computer Vision · 2026-05 · arXiv:2605.30269
Zhongling Wang, Raymond Zhou, Shahrukh Athar, Wenbo Yang +1
LoMo: Local Modality Substitution for Deeper Vision-Language Fusion
Computer Vision · 2026-05 · arXiv:2605.30265
Feng Han, Zhixiong Zhang, Zheming Liang, Yibin Wang +1
minWM: A Full-Stack Open-Source Framework for Real-Time Interactive Video World Models
Computer Vision · 2026-05 · arXiv:2605.30263
Min Zhao, Hongzhou Zhu, Bokai Yan, Zihan Zhou +8
Stable-Layers: Fine-Tuning Image Layer Decomposition Models with VLM-Scored Reinforcement Learning
Computer Vision · 2026-05 · arXiv:2605.30257
Ciara Rowles, Reshinth Adithyan, Nikhil Pinnaparaju, Vikram Voleti +1
VideoFDB: Evaluating Full-Duplex Vision-Speech Capabilities in Conversational Agents
Computer Vision · 2026-05 · arXiv:2605.30256
Amrita Mazumdar, Seonwook Park, Rajarshi Roy, Nikhil Srihari +5
Ambient-robust Inverse Rendering using Active RGB-NIR Imaging
Computer Vision · 2026-05 · arXiv:2605.30250
Hoon-Gyu Chung, Jinnyeong Kim, Hyunwoo Kang, Seung-Hwan Baek
GenClaw: Code-Driven Agentic Image Generation
Computer Vision · 2026-05 · arXiv:2605.30248
Junyan Ye, Jun He, Zilong Huang, Dongzhi Jiang +3
Reinforcement Learning with Robust Rubric Rewards
Computer Vision · 2026-05 · arXiv:2605.30244
Ya-Qi Yu, Hao Wang, Fangyu Hong, Xiangyang Qu +14
SAM3D-Phys: Towards Multi-Object Interactive Simulation in Real World
Computer Vision · 2026-05 · arXiv:2605.30239
Xin Dong, Weijian Deng, Lihan Zhang, Tianru Dai +2
BullingerDB: A Dataset for Handwritten Text Recognition and Writer Retrieval
Computer Vision · 2026-05 · arXiv:2605.30235
Marco Peer, Anna-Scius Bertrand, Patricia Scheurer, Andreas Fischer
Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning
Computer Vision · 2026-05 · arXiv:2605.30231
Chun-Hsiao Yeh, Shengyi Qian, Manchen Wang, Yi Ma +2
IP-Adapter Is All You Need: Towards Fine-Tuning-Free Diffusion-Based Talking Face Generation
Computer Vision · 2026-05 · arXiv:2605.30230
Hao Wu, Xiangyang Luo, Hao Wang, Jiawei Zhang +2
D\'ej\`a View: Looping Transformers for Multi-View 3D Reconstruction
Computer Vision · 2026-05 · arXiv:2605.30215
Alessandro Burzio, Tobias Fischer, Sven Elflein, Qunjie Zhou +8
Cycle Consistency in Video Object-Centric Learning
Computer Vision · 2026-05 · arXiv:2605.30211
Rongzhen Zhao, Zhiyuan Li, Ruonan Wei, Juho Kannala +1
LiveSVG: Zero-Shot SVG Animation via Video Generation
Computer Vision · 2026-05 · arXiv:2605.30174
Matan Levy, Ran Margolin, Bar Cavia, Dvir Samuel +5
Why Far Looks Up: Probing Spatial Representation in Vision-Language Models
Computer Vision · 2026-05 · arXiv:2605.30161
Cheolhong Min, Jaeyun Jung, Daeun Lee, Hyeonseong Jeon +4
AnomalyAgent: Training-Free Agentic Models for Zero-/Few-Shot Anomaly Detection
Computer Vision · 2026-05 · arXiv:2605.30140
Yi Zhang, Jiawen Zhu, Lele Fu, Guansong Pang
PARCEL: Pool-Anchored Resampling with Conditioned Elastic Queries for Efficient Vision-Language Understanding
Computer Vision · 2026-05 · arXiv:2605.30126
Selim Kuzucu, Alessio Tonioni, Vasile Lup, Bernt Schiele +2
SGMD: Score Gradient Matching Distillation for Few-Step Video Diffusion Distillation
Computer Vision · 2026-05 · arXiv:2605.30116
Zhuguanyu Wu, Ruihao Gong, Yang Yong, Yushi Huang +4
Large Depth Completion Model from Sparse Observations
Computer Vision · 2026-05 · arXiv:2605.30115
Zhu Yu, Zhengyi Zhao, Runmin Zhang, Lingteng Qiu +6