Siming Yan — Scifaro

DriveTok: 3D Driving Scene Tokenization for Unified Multi-View Reconstruction and Understanding

With the growing adoption of vision-language-action models and world models in autonomous driving systems, scalable image tokenization becomes crucial as the interface for the visual modality. However, most existing tokenizers are designed…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Dong Zhuo , Wenzhao Zheng , Sicheng Zuo , Siming Yan , Lu Hou , Jie Zhou , Jiwen Lu

ViGoR: Improving Visual Grounding of Large Vision Language Models with Fine-Grained Reward Modeling

By combining natural language understanding, generation capabilities, and breadth of knowledge of large language models with image perception, recent large vision language models (LVLMs) have shown unprecedented visual reasoning…

Computer Vision and Pattern Recognition · Computer Science 2025-12-23 Siming Yan , Min Bai , Weifeng Chen , Xiong Zhou , Qixing Huang , Li Erran Li

DrivePI: Spatial-aware 4D MLLM for Unified Autonomous Driving Understanding, Perception, Prediction and Planning

Although multi-modal large language models (MLLMs) have shown strong capabilities across diverse domains, their application in generating fine-grained 3D perception and prediction outputs in autonomous driving remains underexplored. In this…

Computer Vision and Pattern Recognition · Computer Science 2025-12-16 Zhe Liu , Runhui Huang , Rui Yang , Siming Yan , Zining Wang , Lu Hou , Di Lin , Xiang Bai , Hengshuang Zhao

Representation Learning for Point Cloud Understanding

With the rapid advancement of technology, 3D data acquisition and utilization have become increasingly prevalent across various fields, including computer vision, robotics, and geospatial analysis. 3D data, captured through methods such as…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Siming Yan

Wavelet-based Decoupling Framework for low-light Stereo Image Enhancement

Low-light images suffer from complex degradation, and existing enhancement methods often encode all degradation factors within a single latent space. This leads to highly entangled features and strong black-box characteristics, making the…

Computer Vision and Pattern Recognition · Computer Science 2025-07-17 Shuangli Du , Siming Yan , Zhenghao Shi , Zhenzhen You , Lu Sun

Multi-View Representation is What You Need for Point-Cloud Pre-Training

A promising direction for pre-training 3D point clouds is to leverage the massive amount of data in 2D, whereas the domain gap between 2D and 3D creates a fundamental challenge. This paper proposes a novel approach to point-cloud…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Siming Yan , Chen Song , Youkang Kong , Qixing Huang

3D Feature Prediction for Masked-AutoEncoder-Based Point Cloud Pretraining

Masked autoencoders (MAE) have recently been introduced to 3D self-supervised pretraining for point clouds due to their great success in NLP and computer vision. Unlike MAEs used in the image domain, where the pretext task is to restore…

Computer Vision and Pattern Recognition · Computer Science 2024-04-30 Siming Yan , Yuqi Yang , Yuxiao Guo , Hao Pan , Peng-shuai Wang , Xin Tong , Yang Liu , Qixing Huang

Implicit Autoencoder for Point-Cloud Self-Supervised Representation Learning

This paper advocates the use of implicit surface representation in autoencoder-based self-supervised 3D representation learning. The most popular and accessible 3D representation, i.e., point clouds, involves discrete samples of the…

Computer Vision and Pattern Recognition · Computer Science 2023-08-29 Siming Yan , Zhenpei Yang , Haoxiang Li , Chen Song , Li Guan , Hao Kang , Gang Hua , Qixing Huang

HPNet: Deep Primitive Segmentation Using Hybrid Representations

This paper introduces HPNet, a novel deep-learning approach for segmenting a 3D shape represented as a point cloud into primitive patches. The key to deep primitive segmentation is learning a feature representation that can separate points…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Siming Yan , Zhenpei Yang , Chongyang Ma , Haibin Huang , Etienne Vouga , Qixing Huang

Scene Synthesis via Uncertainty-Driven Attribute Synchronization

Developing deep neural networks to generate 3D scenes is a fundamental problem in neural synthesis with immediate applications in architectural CAD, computer graphics, as well as in generating virtual robot training environments. This task…

Computer Vision and Pattern Recognition · Computer Science 2021-09-02 Haitao Yang , Zaiwei Zhang , Siming Yan , Haibin Huang , Chongyang Ma , Yi Zheng , Chandrajit Bajaj , Qixing Huang

Extreme Relative Pose Network under Hybrid Representations

In this paper, we introduce a novel RGB-D based relative pose estimation approach that is suitable for small-overlapping or non-overlapping scans and can output multiple relative poses. Our method performs scene completion and matches the…

Computer Vision and Pattern Recognition · Computer Science 2020-04-07 Zhenpei Yang , Siming Yan , Qixing Huang

Recurrent Feedback Improves Feedforward Representations in Deep Neural Networks

The abundant recurrent horizontal and feedback connections in the primate visual cortex are thought to play an important role in bringing global and semantic contextual information to early visual areas during perceptual inference, helping…

Neurons and Cognition · Quantitative Biology 2019-12-24 Siming Yan , Xuyang Fang , Bowen Xiao , Harold Rockwell , Yimeng Zhang , Tai Sing Lee

Calcium Removal From Cardiac CT Images Using Deep Convolutional Neural Network

Coronary calcium causes beam hardening and blooming artifacts on cardiac computed tomography angiography (CTA) images, which lead to overestimation of lumen stenosis and reduction of diagnostic specificity. To properly remove coronary…

Computer Vision and Pattern Recognition · Computer Science 2018-03-02 Siming Yan , Feng Shi , Yuhua Chen , Damini Dey , Sang-Eun Lee , Hyuk-Jae Chang , Debiao Li , Yibin Xie