Related papers: Object-Centric Multi-Task Learning for Human Insta…

You Only Learn One Query: Learning Unified Human Query for Single-Stage Multi-Person Multi-Task Human-Centric Perception

Human-centric perception (e.g. detection, segmentation, pose estimation, and attribute analysis) is a long-standing problem for computer vision. This paper introduces a unified and versatile framework (HQNet) for single-stage multi-person…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Sheng Jin , Shuhuai Li , Tong Li , Wentao Liu , Chen Qian , Ping Luo

Improvement of Human-Object Interaction Action Recognition Using Scene Information and Multi-Task Learning Approach

Recent graph convolutional neural networks (GCNs) have shown high performance in the field of human action recognition by using human skeleton poses. However, it fails to detect human-object interaction cases successfully due to the lack of…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Hesham M. Shehata , Mohammad Abdolrahmani

Leveraging Multi-View Weak Supervision for Occlusion-Aware Multi-Human Parsing

Multi-human parsing is the task of segmenting human body parts while associating each part to the person it belongs to, combining instance-level and part-level information for fine-grained human understanding. In this work, we demonstrate…

Computer Vision and Pattern Recognition · Computer Science 2025-09-15 Laura Bragagnolo , Matteo Terreran , Leonardo Barcellona , Stefano Ghidoni

Multiple Object Recognition with Visual Attention

We present an attention-based model for recognizing multiple objects in images. The proposed model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. We show…

Machine Learning · Computer Science 2015-04-24 Jimmy Ba , Volodymyr Mnih , Koray Kavukcuoglu

UniHCP: A Unified Model for Human-Centric Perceptions

Human-centric perceptions (e.g., pose estimation, human parsing, pedestrian detection, person re-identification, etc.) play a key role in industrial applications of visual models. While specific human-centric tasks have their own relevant…

Computer Vision and Pattern Recognition · Computer Science 2023-06-23 Yuanzheng Ci , Yizhou Wang , Meilin Chen , Shixiang Tang , Lei Bai , Feng Zhu , Rui Zhao , Fengwei Yu , Donglian Qi , Wanli Ouyang

MultiTask-CenterNet (MCN): Efficient and Diverse Multitask Learning using an Anchor Free Approach

Multitask learning is a common approach in machine learning, which allows to train multiple objectives with a shared architecture. It has been shown that by training multiple tasks together inference time and compute resources can be saved,…

Computer Vision and Pattern Recognition · Computer Science 2021-09-13 Falk Heuer , Sven Mantowsky , Syed Saqib Bukhari , Georg Schneider

Disjoint Multi-task Learning between Heterogeneous Human-centric Tasks

Human behavior understanding is arguably one of the most important mid-level components in artificial intelligence. In order to efficiently make use of data, multi-task learning has been studied in diverse computer vision tasks including…

Computer Vision and Pattern Recognition · Computer Science 2018-02-15 Dong-Jin Kim , Jinsoo Choi , Tae-Hyun Oh , Youngjin Yoon , In So Kweon

Category Query Learning for Human-Object Interaction Classification

Unlike most previous HOI methods that focus on learning better human-object features, we propose a novel and complementary approach called category query learning. Such queries are explicitly associated to interaction categories, converted…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Chi Xie , Fangao Zeng , Yue Hu , Shuang Liang , Yichen Wei

iCAN: Instance-Centric Attention Network for Human-Object Interaction Detection

Recent years have witnessed rapid progress in detecting and recognizing individual object instances. To understand the situation in a scene, however, computers need to recognize how humans interact with surrounding objects. In this paper,…

Computer Vision and Pattern Recognition · Computer Science 2018-08-31 Chen Gao , Yuliang Zou , Jia-Bin Huang

Learning Fixation Point Strategy for Object Detection and Classification

We propose a novel recurrent attentional structure to localize and recognize objects jointly. The network can learn to extract a sequence of local observations with detailed appearance and rough context, instead of sliding windows or…

Computer Vision and Pattern Recognition · Computer Science 2017-12-20 Jie Lyu , Zejian Yuan , Dapeng Chen

Deep Contextual Attention for Human-Object Interaction Detection

Human-object interaction detection is an important and relatively new class of visual relationship detection tasks, essential for deeper scene understanding. Most existing approaches decompose the problem into object localization and…

Computer Vision and Pattern Recognition · Computer Science 2019-10-18 Tiancai Wang , Rao Muhammad Anwer , Muhammad Haris Khan , Fahad Shahbaz Khan , Yanwei Pang , Ling Shao , Jorma Laaksonen

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. It is a…

Computer Vision and Pattern Recognition · Computer Science 2021-03-09 Tianfei Zhou , Wenguan Wang , Si Liu , Yi Yang , Luc Van Gool

Pose-aware Multi-level Feature Network for Human Object Interaction Detection

Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring…

Computer Vision and Pattern Recognition · Computer Science 2019-09-19 Bo Wan , Desen Zhou , Yongfei Liu , Rongjie Li , Xuming He

Holistic, Instance-Level Human Parsing

Object parsing -- the task of decomposing an object into its semantic parts -- has traditionally been formulated as a category-level segmentation problem. Consequently, when there are multiple objects in an image, current methods cannot…

Computer Vision and Pattern Recognition · Computer Science 2017-09-13 Qizhu Li , Anurag Arnab , Philip H. S. Torr

Learning Human-Object Interaction Detection using Interaction Points

Understanding interactions between humans and objects is one of the fundamental problems in visual classification and an essential step towards detailed scene understanding. Human-object interaction (HOI) detection strives to localize both…

Computer Vision and Pattern Recognition · Computer Science 2020-04-01 Tiancai Wang , Tong Yang , Martin Danelljan , Fahad Shahbaz Khan , Xiangyu Zhang , Jian Sun

A Deeper Look into DeepCap

Human performance capture is a highly important computer vision problem with many applications in movie production and virtual/augmented reality. Many previous performance capture approaches either required expensive multi-view setups or…

Computer Vision and Pattern Recognition · Computer Science 2021-11-23 Marc Habermann , Weipeng Xu , Michael Zollhoefer , Gerard Pons-Moll , Christian Theobalt

Human-Object Interaction Detection:A Quick Survey and Examination of Methods

Human-object interaction detection is a relatively new task in the world of computer vision and visual semantic information extraction. With the goal of machines identifying interactions that humans perform on objects, there are many…

Computer Vision and Pattern Recognition · Computer Science 2020-09-29 Trevor Bergstrom , Humphrey Shi

Object and Text-guided Semantics for CNN-based Activity Recognition

Many previous methods have demonstrated the importance of considering semantically relevant objects for carrying out video-based human activity recognition, yet none of the methods have harvested the power of large text corpora to relate…

Computer Vision and Pattern Recognition · Computer Science 2018-05-07 Sungmin Eum , Christopher Reale , Heesung Kwon , Claire Bonial , Clare Voss

Visual Person Understanding through Multi-Task and Multi-Dataset Learning

We address the problem of learning a single model for person re-identification, attribute classification, body part segmentation, and pose estimation. With predictions for these tasks we gain a more holistic understanding of persons, which…

Computer Vision and Pattern Recognition · Computer Science 2020-11-10 Kilian Pfeiffer , Alexander Hermans , István Sárándi , Mark Weber , Bastian Leibe

Semantic Segmentation Enhanced Transformer Model for Human Attention Prediction

Saliency Prediction aims to predict the attention distribution of human eyes given an RGB image. Most of the recent state-of-the-art methods are based on deep image feature representations from traditional CNNs. However, the traditional…

Computer Vision and Pattern Recognition · Computer Science 2023-01-27 Shuo Zhang