Related papers: Unsupervised Keypoint Learning for Guiding Class-C…

Object Concepts Emerge from Motion

Object concepts play a foundational role in human visual cognition, enabling perception, memory, and interaction in the physical world. Inspired by findings in developmental neuroscience - where infants are shown to acquire object…

Computer Vision and Pattern Recognition · Computer Science 2025-05-29 Haoqian Liang , Xiaohui Wang , Zhichao Li , Ya Yang , Naiyan Wang

Hand-Object Contact Prediction via Motion-Based Pseudo-Labeling and Guided Progressive Label Correction

Every hand-object interaction begins with contact. Despite predicting the contact state between hands and objects is useful in understanding hand-object interactions, prior methods on hand-object analysis have assumed that the interacting…

Computer Vision and Pattern Recognition · Computer Science 2021-10-22 Takuma Yagi , Md Tasnimul Hasan , Yoichi Sato

Unsupervised 3D Keypoint Discovery with Multi-View Geometry

Analyzing and training 3D body posture models depend heavily on the availability of joint labels that are commonly acquired through laborious manual annotation of body joints or via marker-based joint localization using carefully curated…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Sina Honari , Chen Zhao , Mathieu Salzmann , Pascal Fua

Active Learning for Deep Detection Neural Networks

The cost of drawing object bounding boxes (i.e. labeling) for millions of images is prohibitively high. For instance, labeling pedestrians in a regular urban image could take 35 seconds on average. Active learning aims to reduce the cost of…

Computer Vision and Pattern Recognition · Computer Science 2019-11-22 Hamed H. Aghdam , Abel Gonzalez-Garcia , Joost van de Weijer , Antonio M. López

Training Object Detectors from Few Weakly-Labeled and Many Unlabeled Images

Weakly-supervised object detection attempts to limit the amount of supervision by dispensing the need for bounding boxes, but still assumes image-level labels on the entire training set. In this work, we study the problem of training an…

Computer Vision and Pattern Recognition · Computer Science 2021-07-22 Zhaohui Yang , Miaojing Shi , Chao Xu , Vittorio Ferrari , Yannis Avrithis

Prediction and Description of Near-Future Activities in Video

Most of the existing works on human activity analysis focus on recognition or early recognition of the activity labels from complete or partial observations. Similarly, almost all of the existing video captioning approaches focus on the…

Computer Vision and Pattern Recognition · Computer Science 2021-05-28 Tahmida Mahmud , Mohammad Billah , Mahmudul Hasan , Amit K. Roy-Chowdhury

On the Benefits of Instance Decomposition in Video Prediction Models

Video prediction is a crucial task for intelligent agents such as robots and autonomous vehicles, since it enables them to anticipate and act early on time-critical incidents. State-of-the-art video prediction methods typically model the…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Eliyas Suleyman , Paul Henderson , Nicolas Pugeault

Learning Robot Activities from First-Person Human Videos Using Convolutional Future Regression

We design a new approach that allows robot learning of new activities from unlabeled human example videos. Given videos of humans executing the same activity from a human's viewpoint (i.e., first-person videos), our objective is to make the…

Robotics · Computer Science 2017-07-25 Jangwon Lee , Michael S. Ryoo

Visual Representation Learning with Stochastic Frame Prediction

Self-supervised learning of image representations by predicting future frames is a promising direction but still remains a challenge. This is because of the under-determined nature of frame prediction; multiple potential futures can arise…

Computer Vision and Pattern Recognition · Computer Science 2024-08-12 Huiwon Jang , Dongyoung Kim , Junsu Kim , Jinwoo Shin , Pieter Abbeel , Younggyo Seo

Self-Supervised Pretraining of 3D Features on any Point-Cloud

Pretraining on large labeled datasets is a prerequisite to achieve good performance in many computer vision tasks like 2D object recognition, video classification etc. However, pretraining is not widely used for 3D recognition tasks where…

Computer Vision and Pattern Recognition · Computer Science 2021-01-08 Zaiwei Zhang , Rohit Girdhar , Armand Joulin , Ishan Misra

Unsupervised Discovery of Actions in Instructional Videos

In this paper we address the problem of automatically discovering atomic actions in unsupervised manner from instructional videos. Instructional videos contain complex activities and are a rich source of information for intelligent agents,…

Computer Vision and Pattern Recognition · Computer Science 2021-06-29 AJ Piergiovanni , Anelia Angelova , Michael S. Ryoo , Irfan Essa

Video Interpolation and Prediction with Unsupervised Landmarks

Prediction and interpolation for long-range video data involves the complex task of modeling motion trajectories for each visible object, occlusions and dis-occlusions, as well as appearance changes due to viewpoint and lighting. Optical…

Computer Vision and Pattern Recognition · Computer Science 2019-09-09 Kevin J. Shih , Aysegul Dundar , Animesh Garg , Robert Pottorf , Andrew Tao , Bryan Catanzaro

Context Matters: Refining Object Detection in Video with Recurrent Neural Networks

Given the vast amounts of video available online, and recent breakthroughs in object detection with static images, object detection in video offers a promising new frontier. However, motion blur and compression artifacts cause substantial…

Computer Vision and Pattern Recognition · Computer Science 2016-07-20 Subarna Tripathi , Zachary C. Lipton , Serge Belongie , Truong Nguyen

Self-supervised Motion Learning from Static Images

Motions are reflected in videos as the movement of pixels, and actions are essentially patterns of inconsistent motions between the foreground and the background. To well distinguish the actions, especially those with complicated…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Ziyuan Huang , Shiwei Zhang , Jianwen Jiang , Mingqian Tang , Rong Jin , Marcelo Ang

DyStaB: Unsupervised Object Segmentation via Dynamic-Static Bootstrapping

We describe an unsupervised method to detect and segment portions of images of live scenes that, at some point in time, are seen moving as a coherent whole, which we refer to as objects. Our method first partitions the motion field by…

Computer Vision and Pattern Recognition · Computer Science 2021-04-06 Yanchao Yang , Brian Lai , Stefano Soatto

Unsupervised Learning of Effective Actions in Robotics

Learning actions that are relevant to decision-making and can be executed effectively is a key problem in autonomous robotics. Current state-of-the-art action representations in robotics lack proper effect-driven learning of the robot's…

Robotics · Computer Science 2024-04-04 Marko Zaric , Jakob Hollenstein , Justus Piater , Erwan Renaudo

Self-Supervised Relative Depth Learning for Urban Scene Understanding

As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over…

Computer Vision and Pattern Recognition · Computer Science 2018-04-03 Huaizu Jiang , Erik Learned-Miller , Gustav Larsson , Michael Maire , Greg Shakhnarovich

A unified model for continuous conditional video prediction

Different conditional video prediction tasks, like video future frame prediction and video frame interpolation, are normally solved by task-related models even though they share many common underlying characteristics. Furthermore, almost…

Computer Vision and Pattern Recognition · Computer Science 2023-04-10 Xi Ye , Guillaume-Alexandre Bilodeau

Unsupervised Learning of Long-Term Motion Dynamics for Videos

We present an unsupervised representation learning approach that compactly encodes the motion dependencies in videos. Given a pair of images from a video clip, our framework learns to predict the long-term 3D motions. To reduce the…

Computer Vision and Pattern Recognition · Computer Science 2017-04-13 Zelun Luo , Boya Peng , De-An Huang , Alexandre Alahi , Li Fei-Fei

Fast keypoint detection in video sequences

A number of computer vision tasks exploit a succinct representation of the visual content in the form of sets of local features. Given an input image, feature extraction algorithms identify a set of keypoints and assign to each of them a…

Computer Vision and Pattern Recognition · Computer Science 2016-11-18 Luca Baroffio , Matteo Cesana , Alessandro Redondi , Marco Tagliasacchi