Related papers: StackFLOW: Monocular Human-Object Reconstruction b…

Monocular Human-Object Reconstruction in the Wild

Learning the prior knowledge of the 3D human-object spatial relation is crucial for reconstructing human-object interaction from images and understanding how humans interact with objects in 3D space. Previous works learn this prior from…

Computer Vision and Pattern Recognition · Computer Science 2024-08-01 Chaofan Huo , Ye Shi , Jingya Wang

Single-image coherent reconstruction of objects and humans

Existing methods for reconstructing objects and humans from a monocular image suffer from severe mesh collisions and performance limitations for interacting occluding objects. This paper introduces a method to obtain a globally consistent…

Computer Vision and Pattern Recognition · Computer Science 2024-08-16 Sarthak Batra , Partha P. Chakrabarti , Simon Hadfield , Armin Mustafa

Human-Aware Object Placement for Visual Environment Reconstruction

Humans are in constant contact with the world as they move through it and interact with it. This contact is a vital source of information for understanding 3D humans, 3D scenes, and the interactions between them. In fact, we demonstrate…

Computer Vision and Pattern Recognition · Computer Science 2022-03-30 Hongwei Yi , Chun-Hao P. Huang , Dimitrios Tzionas , Muhammed Kocabas , Mohamed Hassan , Siyu Tang , Justus Thies , Michael J. Black

Learning Object Arrangements in 3D Scenes using Human Context

We consider the problem of learning object arrangements in a 3D scene. The key idea here is to learn how objects relate to human poses based on their affordances, ease of use and reachability. In contrast to modeling object-object…

Machine Learning · Computer Science 2012-07-03 Yun Jiang , Marcus Lim , Ashutosh Saxena

MoCo-Flow: Neural Motion Consensus Flow for Dynamic Humans in Stationary Monocular Cameras

Synthesizing novel views of dynamic humans from stationary monocular cameras is a specialized but desirable setup. This is particularly attractive as it does not require static scenes, controlled environments, or specialized capture…

Computer Vision and Pattern Recognition · Computer Science 2022-02-08 Xuelin Chen , Weiyu Li , Daniel Cohen-Or , Niloy J. Mitra , Baoquan Chen

Perceiving 3D Human-Object Spatial Arrangements from a Single Image in the Wild

We present a method that infers spatial arrangements and shapes of humans and objects in a globally consistent 3D scene, all from a single image in-the-wild captured in an uncontrolled environment. Notably, our method runs on datasets…

Computer Vision and Pattern Recognition · Computer Science 2020-08-21 Jason Y. Zhang , Sam Pepose , Hanbyul Joo , Deva Ramanan , Jitendra Malik , Angjoo Kanazawa

Optical Flow-based 3D Human Motion Estimation from Monocular Video

We present a generative method to estimate 3D human motion and body shape from monocular video. Under the assumption that starting from an initial pose optical flow constrains subsequent human motion, we exploit flow to find temporally…

Computer Vision and Pattern Recognition · Computer Science 2017-03-22 Thiemo Alldieck , Marc Kassubeck , Marcus Magnor

Bootstrapping Human Optical Flow and Pose

We propose a bootstrapping framework to enhance human optical flow and pose. We show that, for videos involving humans in scenes, we can improve both the optical flow and the pose estimation quality of humans by considering the two tasks at…

Computer Vision and Pattern Recognition · Computer Science 2022-10-31 Aritro Roy Arko , James J. Little , Kwang Moo Yi

Probabilistic Monocular 3D Human Pose Estimation with Normalizing Flows

3D human pose estimation from monocular images is a highly ill-posed problem due to depth ambiguities and occlusions. Nonetheless, most existing works ignore these ambiguities and only estimate a single solution. In contrast, we generate a…

Computer Vision and Pattern Recognition · Computer Science 2021-08-03 Tom Wehrbein , Marco Rudolph , Bodo Rosenhahn , Bastian Wandt

Estimating 3D Motion and Forces of Person-Object Interactions from Monocular Video

In this paper, we introduce a method to automatically reconstruct the 3D motion of a person interacting with an object from a single RGB video. Our method estimates the 3D poses of the person and the object, contact positions, and forces…

Computer Vision and Pattern Recognition · Computer Science 2019-06-18 Zongmian Li , Jiri Sedlar , Justin Carpentier , Ivan Laptev , Nicolas Mansard , Josef Sivic

Realistic Clothed Human and Object Joint Reconstruction from a Single Image

Recent approaches to jointly reconstruct 3D humans and objects from a single RGB image represent 3D shapes with template-based or coarse models, which fail to capture details of loose clothing on human bodies. In this paper, we introduce a…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Ayushi Dutta , Marco Pesavento , Marco Volino , Adrian Hilton , Armin Mustafa

iMapper: Interaction-guided Joint Scene and Human Motion Mapping from Monocular Videos

A long-standing challenge in scene analysis is the recovery of scene arrangements under moderate to heavy occlusion, directly from monocular video. While the problem remains a subject of active research, concurrent advances have been made…

Graphics · Computer Science 2019-07-19 Aron Monszpart , Paul Guerrero , Duygu Ceylan , Ersin Yumer , Niloy J. Mitra

Visibility Aware Human-Object Interaction Tracking from Single RGB Camera

Capturing the interactions between humans and their environment in 3D is important for many applications in robotics, graphics, and vision. Recent works to reconstruct the 3D human and object from a single RGB image do not have consistent…

Computer Vision and Pattern Recognition · Computer Science 2023-11-01 Xianghui Xie , Bharat Lal Bhatnagar , Gerard Pons-Moll

Object-Scene-Camera Decomposition and Recomposition for Data-Efficient Monocular 3D Object Detection

Monocular 3D object detection (M3OD) is intrinsically ill-posed, hence training a high-performance deep learning based M3OD model requires a humongous amount of labeled data with complicated visual variation from diverse scenes, variety of…

Computer Vision and Pattern Recognition · Computer Science 2026-03-10 Zhaonian Kuang , Rui Ding , Meng Yang , Xinhu Zheng , Gang Hua

STAF: 3D Human Mesh Recovery from Video with Spatio-Temporal Alignment Fusion

The recovery of 3D human mesh from monocular images has significantly been developed in recent years. However, existing models usually ignore spatial and temporal information, which might lead to mesh and image misalignment and temporal…

Computer Vision and Pattern Recognition · Computer Science 2024-01-04 Wei Yao , Hongwen Zhang , Yunlian Sun , Jinhui Tang

Stacked Hourglass Networks for Human Pose Estimation

This work introduces a novel convolutional network architecture for the task of human pose estimation. Features are processed across all scales and consolidated to best capture the various spatial relationships associated with the body. We…

Computer Vision and Pattern Recognition · Computer Science 2016-07-27 Alejandro Newell , Kaiyu Yang , Jia Deng

HOI-M3:Capture Multiple Humans and Objects Interaction within Contextual Environment

Humans naturally interact with both others and the surrounding multiple objects, engaging in various social activities. However, recent advances in modeling human-object interactions mostly focus on perceiving isolated individuals and…

Computer Vision and Pattern Recognition · Computer Science 2024-04-03 Juze Zhang , Jingyan Zhang , Zining Song , Zhanhe Shi , Chengfeng Zhao , Ye Shi , Jingyi Yu , Lan Xu , Jingya Wang

D$^3$-Human: Dynamic Disentangled Digital Human from Monocular Video

We introduce D$^3$-Human, a method for reconstructing Dynamic Disentangled Digital Human geometry from monocular videos. Past monocular video human reconstruction primarily focuses on reconstructing undecoupled clothed human bodies or only…

Computer Vision and Pattern Recognition · Computer Science 2025-01-06 Honghu Chen , Bo Peng , Yunfan Tao , Juyong Zhang

NormalFlow: Fast, Robust, and Accurate Contact-based Object 6DoF Pose Tracking with Vision-based Tactile Sensors

Tactile sensing is crucial for robots aiming to achieve human-level dexterity. Among tactile-dependent skills, tactile-based object tracking serves as the cornerstone for many tasks, including manipulation, in-hand manipulation, and 3D…

Robotics · Computer Science 2025-03-19 Hung-Jui Huang , Michael Kaess , Wenzhen Yuan

MonoPerfCap: Human Performance Capture from Monocular Video

We present the first marker-less approach for temporally coherent 3D performance capture of a human with general clothing from monocular video. Our approach reconstructs articulated human skeleton motion as well as medium-scale non-rigid…

Computer Vision and Pattern Recognition · Computer Science 2018-02-26 Weipeng Xu , Avishek Chatterjee , Michael Zollhöfer , Helge Rhodin , Dushyant Mehta , Hans-Peter Seidel , Christian Theobalt