Yebin Liu — Scifaro

GAF: Gaussian Action Field as a 4D Representation for Dynamic World Modeling in Robotic Manipulation

Accurate scene perception is critical for vision-based robotic manipulation. Existing approaches typically follow either a Vision-to-Action (V-A) paradigm, predicting actions directly from visual inputs, or a Vision-to-3D-to-Action (V-3D-A)…

Robotics · Computer Science 2026-05-25 Ying Chai , Litao Deng , Ruizhi Shao , Jiajun Zhang , Kangchen Lv , Liangjun Xing , Xiang Li , Hongwen Zhang , Yebin Liu

SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild

3D animal reconstruction in the wild remains challenging due to large species variation, frequent occlusions, and the prevalence of multi-animal scenes, while existing methods predominantly focus on single-animal settings. We present SAM 3D…

Computer Vision and Pattern Recognition · Computer Science 2026-05-11 Xuyi Hu , Jin Lyu , Jiuming Liu , Yebin Liu , Silvia Zuffi , Liang An , Stefan Goetz

Mix3R: Mixing Feed-forward Reconstruction and Generative 3D Priors for Joint Multi-view Aligned 3D Reconstruction and Pose Estimation

Recent trends in sparse-view 3D reconstruction have taken two different paths: feed-forward reconstruction that predicts pixel-aligned point maps without a complete geometry, and generative 3D reconstruction that generates complete geometry…

Computer Vision and Pattern Recognition · Computer Science 2026-05-06 Siyou Lin , Zhou Xue , Hongwen Zhang , Liang An , Dongping Li , Shaohui Jiao , Yebin Liu

SynAgent: Generalizable Cooperative Humanoid Manipulation via Solo-to-Cooperative Agent Synergy

Controllable cooperative humanoid manipulation is a fundamental yet challenging problem for embodied intelligence, due to severe data scarcity, complexities in multi-agent coordination, and limited generalization across objects. In this…

Computer Vision and Pattern Recognition · Computer Science 2026-04-22 Wei Yao , Haohan Ma , Hongwen Zhang , Yunlian Sun , Liangjun Xing , Zhile Yang , Yuanjun Guo , Yebin Liu , Jinhui Tang

OmniHands: Towards Robust 4D Hand Mesh Recovery via A Versatile Transformer

In this paper, we introduce OmniHands, a universal approach to recovering interactive hand meshes and their relative movement from monocular or multi-view inputs. Our approach addresses two major limitations of previous methods: lacking a…

Computer Vision and Pattern Recognition · Computer Science 2026-04-15 Dixuan Lin , Yuxiang Zhang , Mengcheng Li , Wei Jing , Qi Yan , Qianying Wang , Yebin Liu , Hongwen Zhang

GeoDiff4D: Geometry-Aware Diffusion for 4D Head Avatar Reconstruction

Reconstructing photorealistic and animatable 4D head avatars from a single portrait image remains a fundamental challenge in computer vision. While diffusion models have enabled remarkable progress in image and video generation for avatar…

Computer Vision and Pattern Recognition · Computer Science 2026-03-13 Chao Xu , Xiaochen Zhao , Xiang Deng , Jingxiang Sun , Donglin Di , Zhuo Su , Yebin Liu

4DEquine: Disentangling Motion and Appearance for 4D Equine Reconstruction from Monocular Video

4D reconstruction of equine family (e.g. horses) from monocular video is important for animal welfare. Previous mainstream 4D animal reconstruction methods require joint optimization of motion and appearance over a whole video, which is…

Computer Vision and Pattern Recognition · Computer Science 2026-03-12 Jin Lyu , Liang An , Pujin Cheng , Yebin Liu , Xiaoying Tang

SEGA: Drivable 3D Gaussian Head Avatar from a Single Image

Creating photorealistic 3D head avatars from limited input has become increasingly important for applications in virtual reality, telepresence, and digital entertainment. While recent advances like neural rendering and 3D Gaussian splatting…

Graphics · Computer Science 2026-03-12 Chen Guo , Zhuo Su , Liao Wang , Jian Wang , Shuang Li , Xu Chang , Zhaohu Li , Yang Zhao , Guidong Wang , Yebin Liu , Ruqi Huang

MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation

We propose MAViD, a novel Multimodal framework for Audio-Visual Dialogue understanding and generation. Existing approaches primarily focus on non-interactive systems and are limited to producing constrained and unnatural human speech. The…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Youxin Pang , Jiajun Liu , Lingfeng Tan , Yong Zhang , Feng Gao , Xiang Deng , Zhuoliang Kang , Xiaoming Wei , Yebin Liu

SyncMV4D: Synchronized Multi-view Joint Diffusion of Appearance and Motion for Hand-Object Interaction Synthesis

Hand-Object Interaction (HOI) generation plays a critical role in advancing applications across animation and robotics. Current video-based methods are predominantly single-view, which impedes comprehensive 3D geometry perception and often…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Lingwei Dang , Zonghan Li , Juntong Li , Hongwen Zhang , Liang An , Yebin Liu , Qingyao Wu

U-Mind: A Unified Framework for Real-Time Multimodal Interaction with Audiovisual Generation

Full-stack multimodal interaction in real-time is a central goal in building intelligent embodied agents capable of natural, dynamic communication. However, existing systems are either limited to unimodal generation or suffer from degraded…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Xiang Deng , Feng Gao , Yong Zhang , Youxin Pang , Xu Xiaoming , Zhuoliang Kang , Xiaoming Wei , Yebin Liu

Stereo-Talker: Audio-driven 3D Human Synthesis with Prior-Guided Mixture-of-Experts

This paper introduces Stereo-Talker, a novel one-shot audio-driven human video synthesis system that generates 3D talking videos with precise lip synchronization, expressive body gestures, temporally consistent photo-realistic quality, and…

Computer Vision and Pattern Recognition · Computer Science 2026-03-02 Xiang Deng , Youxin Pang , Xiaochen Zhao , Chao Xu , Lizhen Wang , Hongjiang Xiao , Shi Yan , Hongwen Zhang , Yebin Liu

Monocular Mesh Recovery and Body Measurement of Female Saanen Goats

The lactation performance of Saanen dairy goats, renowned for their high milk yield, is intrinsically linked to their body size, making accurate 3D body measurement essential for assessing milk production potential, yet existing…

Computer Vision and Pattern Recognition · Computer Science 2026-02-24 Bo Jin , Shichao Zhao , Jin Lyu , Bin Zhang , Tao Yu , Liang An , Yebin Liu , Meili Wang

SharpTimeGS: Sharp and Stable Dynamic Gaussian Splatting via Lifespan Modulation

Novel view synthesis of dynamic scenes is fundamental to achieving photorealistic 4D reconstruction and immersive visual experiences. Recent progress in Gaussian-based representations has significantly improved real-time rendering quality,…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Zhanfeng Liao , Jiajun Zhang , Hanzhang Tu , Zhixi Wang , Yunqi Gao , Hongwen Zhang , Yebin Liu

CloSET: Modeling Clothed Humans on Continuous Surface with Explicit Template Decomposition

Creating animatable avatars from static scans requires the modeling of clothing deformations in different poses. Existing learning-based methods typically add pose-dependent deformations upon a minimally-clothed mesh template or a learned…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Hongwen Zhang , Siyou Lin , Ruizhi Shao , Yuxiang Zhang , Zerong Zheng , Han Huang , Yandong Guo , Yebin Liu

PyMAF-X: Towards Well-aligned Full-body Model Regression from Monocular Images

We present PyMAF-X, a regression-based approach to recovering parametric full-body models from monocular images. This task is very challenging since minor parametric deviation may lead to noticeable misalignment between the estimated mesh…

Computer Vision and Pattern Recognition · Computer Science 2026-01-27 Hongwen Zhang , Yating Tian , Yuxiang Zhang , Mengcheng Li , Liang An , Zhenan Sun , Yebin Liu

FOF-X: Towards Real-time Detailed Human Reconstruction from a Single Image

We introduce FOF-X for real-time reconstruction of detailed human geometry from a single image. Balancing real-time speed against high-quality results is a persistent challenge, mainly due to the high computational demands of existing 3D…

Computer Vision and Pattern Recognition · Computer Science 2026-01-19 Qiao Feng , Yuanwang Yang , Yebin Liu , Yu-Kun Lai , Jingyu Yang , Kun Li

FlexAvatar: Flexible Large Reconstruction Model for Animatable Gaussian Head Avatars with Detailed Deformation

We present FlexAvatar, a flexible large reconstruction model for high-fidelity 3D head avatars with detailed dynamic deformation from single or sparse images, without requiring camera poses or expression labels. It leverages a…

Computer Vision and Pattern Recognition · Computer Science 2025-12-22 Cheng Peng , Zhuo Su , Liao Wang , Chen Guo , Zhaohu Li , Chengjiang Long , Zheng Lv , Jingxiang Sun , Chenyangguang Zhang , Yebin Liu

Tessellation GS: Neural Mesh Gaussians for Robust Monocular Reconstruction of Dynamic Objects

3D Gaussian Splatting (GS) enables highly photorealistic scene reconstruction from posed image sequences but struggles with viewpoint extrapolation due to its anisotropic nature, leading to overfitting and poor generalization, particularly…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Shuohan Tao , Boyao Zhou , Hanzhang Tu , Yuwang Wang , Yebin Liu

UniMo: Unifying 2D Video and 3D Human Motion with an Autoregressive Framework

We propose UniMo, an innovative autoregressive model for joint modeling of 2D human videos and 3D human motions within a unified framework, enabling simultaneous generation and understanding of these two modalities for the first time.…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Youxin Pang , Yong Zhang , Ruizhi Shao , Xiang Deng , Feng Gao , Xu Xiaoming , Xiaoming Wei , Yebin Liu