Related papers: SIMstack: A Generative Shape and Instance Model fo…

Volumetric Instance-Aware Semantic Mapping and 3D Object Discovery

To autonomously navigate and plan interactions in real-world environments, robots require the ability to robustly perceive and map complex, unstructured surrounding scenes. Besides building an internal representation of the observed scene…

Robotics · Computer Science 2021-05-18 Margarita Grinvald , Fadri Furrer , Tonci Novkovic , Jen Jen Chung , Cesar Cadena , Roland Siegwart , Juan Nieto

LAVAE: Disentangling Location and Appearance

We propose a probabilistic generative model for unsupervised learning of structured, interpretable, object-based representations of visual scenes. We use amortized variational inference to train the generative model end-to-end. The learned…

Machine Learning · Computer Science 2019-09-30 Andrea Dittadi , Ole Winther

Instance Segmentation of Visible and Occluded Regions for Finding and Picking Target from a Pile of Objects

We present a robotic system for picking a target from a pile of objects that is capable of finding and grasping the target object by removing obstacles in the appropriate order. The fundamental idea is to segment instances with both visible…

Robotics · Computer Science 2020-01-22 Kentaro Wada , Shingo Kitagawa , Kei Okada , Masayuki Inaba

SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attention and Decomposition

The ability to decompose complex multi-object scenes into meaningful abstractions like objects is fundamental to achieve higher-level cognition. Previous approaches for unsupervised object-oriented scene representation learning are either…

Machine Learning · Computer Science 2020-03-17 Zhixuan Lin , Yi-Fu Wu , Skand Vishwanath Peri , Weihao Sun , Gautam Singh , Fei Deng , Jindong Jiang , Sungjin Ahn

SIM: Semantic-aware Instance Mask Generation for Box-Supervised Instance Segmentation

Weakly supervised instance segmentation using only bounding box annotations has recently attracted much research attention. Most of the current efforts leverage low-level image features as extra supervision without explicitly exploiting the…

Computer Vision and Pattern Recognition · Computer Science 2023-03-16 Ruihuang Li , Chenhang He , Yabin Zhang , Shuai Li , Liyi Chen , Lei Zhang

Instance Segmentation of Biomedical Images with an Object-aware Embedding Learned with Local Constraints

Automatic instance segmentation is a problem that occurs in many biomedical applications. State-of-the-art approaches either perform semantic segmentation or refine object bounding boxes obtained from detection methods. Both suffer from…

Computer Vision and Pattern Recognition · Computer Science 2020-04-22 Long Chen , Martin Strauch , Dorit Merhof

InstaScene: Towards Complete 3D Instance Decomposition and Reconstruction from Cluttered Scenes

Humans can naturally identify and mentally complete occluded objects in cluttered environments. However, imparting similar cognitive ability to robotics remains challenging even with advanced reconstruction techniques, which models scenes…

Computer Vision and Pattern Recognition · Computer Science 2025-07-22 Zesong Yang , Bangbang Yang , Wenqi Dong , Chenxuan Cao , Liyuan Cui , Yuewen Ma , Zhaopeng Cui , Hujun Bao

Stacked Capsule Autoencoders

Objects are composed of a set of geometrically organized parts. We introduce an unsupervised capsule autoencoder (SCAE), which explicitly uses geometric relationships between parts to reason about objects. Since these relationships do not…

Machine Learning · Statistics 2019-12-03 Adam R. Kosiorek , Sara Sabour , Yee Whye Teh , Geoffrey E. Hinton

Counting Stacked Objects

Visual object counting is a fundamental computer vision task underpinning numerous real-world applications, from cell counting in biomedicine to traffic and wildlife monitoring. However, existing methods struggle to handle the challenge of…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Corentin Dumery , Noa Etté , Aoxiang Fan , Ren Li , Jingyi Xu , Hieu Le , Pascal Fua

Object Instance Retrieval in Assistive Robotics: Leveraging Fine-Tuned SimSiam with Multi-View Images Based on 3D Semantic Map

Robots that assist humans in their daily lives should be able to locate specific instances of objects in an environment that match a user's desired objects. This task is known as instance-specific image goal navigation (InstanceImageNav),…

Robotics · Computer Science 2025-09-08 Taichi Sakaguchi , Akira Taniguchi , Yoshinobu Hagiwara , Lotfi El Hafi , Shoichi Hasegawa , Tadahiro Taniguchi

UnScene3D: Unsupervised 3D Instance Segmentation for Indoor Scenes

3D instance segmentation is fundamental to geometric understanding of the world around us. Existing methods for instance segmentation of 3D scenes rely on supervision from expensive, manual 3D annotations. We propose UnScene3D, the first…

Computer Vision and Pattern Recognition · Computer Science 2024-05-01 David Rozenberszki , Or Litany , Angela Dai

SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition

To help agents reason about scenes in terms of their building blocks, we wish to extract the compositional structure of any given scene (in particular, the configuration and characteristics of objects comprising the scene). This problem is…

Computer Vision and Pattern Recognition · Computer Science 2021-12-07 Rishabh Kabra , Daniel Zoran , Goker Erdogan , Loic Matthey , Antonia Creswell , Matthew Botvinick , Alexander Lerchner , Christopher P. Burgess

3D Instance Segmentation via Multi-Task Metric Learning

We propose a novel method for instance label segmentation of dense 3D voxel grids. We target volumetric scene representations, which have been acquired with depth sensors or multi-view stereo methods and which have been processed with…

Computer Vision and Pattern Recognition · Computer Science 2019-11-04 Jean Lahoud , Bernard Ghanem , Marc Pollefeys , Martin R. Oswald

Holistic 3D Scene Understanding from a Single Image with Implicit Representation

We present a new pipeline for holistic 3D scene understanding from a single image, which could predict object shapes, object poses, and scene layout. As it is a highly ill-posed problem, existing methods usually suffer from inaccurate…

Computer Vision and Pattern Recognition · Computer Science 2021-08-24 Cheng Zhang , Zhaopeng Cui , Yinda Zhang , Bing Zeng , Marc Pollefeys , Shuaicheng Liu

Sequential Amodal Segmentation via Cumulative Occlusion Learning

To fully understand the 3D context of a single image, a visual system must be able to segment both the visible and occluded regions of objects, while discerning their occlusion order. Ideally, the system should be able to handle any object…

Computer Vision and Pattern Recognition · Computer Science 2024-05-10 Jiayang Ao , Qiuhong Ke , Krista A. Ehinger

Disentangled 3D Scene Generation with Layout Learning

We introduce a method to generate 3D scenes that are disentangled into their component objects. This disentanglement is unsupervised, relying only on the knowledge of a large pretrained text-to-image model. Our key insight is that objects…

Computer Vision and Pattern Recognition · Computer Science 2024-02-28 Dave Epstein , Ben Poole , Ben Mildenhall , Alexei A. Efros , Aleksander Holynski

iPose: Instance-Aware 6D Pose Estimation of Partly Occluded Objects

We address the task of 6D pose estimation of known rigid objects from single input images in scenarios where the objects are partly occluded. Recent RGB-D-based methods are robust to moderate degrees of occlusion. For RGB inputs, no…

Computer Vision and Pattern Recognition · Computer Science 2018-06-19 Omid Hosseini Jafari , Siva Karthik Mustikovela , Karl Pertsch , Eric Brachmann , Carsten Rother

Towards Scene Understanding with Detailed 3D Object Representations

Current approaches to semantic image and scene understanding typically employ rather simple object representations such as 2D or 3D bounding boxes. While such coarse models are robust and allow for reliable object detection, they discard…

Computer Vision and Pattern Recognition · Computer Science 2014-11-24 M. Zeeshan Zia , Michael Stark , Konrad Schindler

STEm-Seg: Spatio-temporal Embeddings for Instance Segmentation in Videos

Existing methods for instance segmentation in videos typically involve multi-stage pipelines that follow the tracking-by-detection paradigm and model a video clip as a sequence of images. Multiple networks are used to detect objects in…

Computer Vision and Pattern Recognition · Computer Science 2023-09-04 Ali Athar , Sabarinath Mahadevan , Aljoša Ošep , Laura Leal-Taixé , Bastian Leibe

Local Occupancy-Enhanced Object Grasping with Multiple Triplanar Projection

This paper addresses the challenge of robotic grasping of general objects. Similar to prior research, the task reads a single-view 3D observation (i.e., point clouds) captured by a depth camera as input. Crucially, the success of object…

Robotics · Computer Science 2024-07-23 Kangqi Ma , Hao Dong , Yadong Mu