Related papers: Efficient Multi-Object Pose Estimation using Multi…

Enhanced 3D Human Pose Estimation from Videos by using Attention-Based Neural Network with Dilated Convolutions

The attention mechanism provides a sequential prediction framework for learning spatial models with enhanced implicit temporal consistency. In this work, we show a systematic design (from 2D to 3D) for how conventional networks and other…

Computer Vision and Pattern Recognition · Computer Science 2021-03-05 Ruixu Liu , Ju Shen , He Wang , Chen Chen , Sen-ching Cheung , Vijayan K. Asari

Vision Transformer with Deformable Attention

Transformers have recently shown superior performances on various vision tasks. The large, sometimes even global, receptive field endows Transformer models with higher representation power over their CNN counterparts. Nevertheless, simply…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Zhuofan Xia , Xuran Pan , Shiji Song , Li Erran Li , Gao Huang

Multiple Object Recognition with Visual Attention

We present an attention-based model for recognizing multiple objects in images. The proposed model is a deep recurrent neural network trained with reinforcement learning to attend to the most relevant regions of the input image. We show…

Machine Learning · Computer Science 2015-04-24 Jimmy Ba , Volodymyr Mnih , Koray Kavukcuoglu

Video based Object 6D Pose Estimation using Transformers

We introduce a Transformer based 6D Object Pose Estimation framework VideoPose, comprising an end-to-end attention based modelling architecture, that attends to previous frames in order to estimate accurate 6D Object Poses in videos. Our…

Computer Vision and Pattern Recognition · Computer Science 2023-09-06 Apoorva Beedu , Huda Alamri , Irfan Essa

Efficient Inter-Task Attention for Multitask Transformer Models

In both Computer Vision and the wider Deep Learning field, the Transformer architecture is well-established as state-of-the-art for many applications. For Multitask Learning, however, where there may be many more queries necessary compared…

Computer Vision and Pattern Recognition · Computer Science 2025-08-07 Christian Bohn , Thomas Kurbiel , Klaus Friedrichs , Hasan Tercan , Tobias Meisen

Deep Models for Multi-View 3D Object Recognition: A Review

Human decision-making often relies on visual information from multiple perspectives or views. In contrast, machine learning-based object recognition utilizes information from a single image of the object. However, the information conveyed…

Computer Vision and Pattern Recognition · Computer Science 2025-10-01 Mona Alzahrani , Muhammad Usman , Salma Kammoun , Saeed Anwar , Tarek Helmy

Interacting Hand-Object Pose Estimation via Dense Mutual Attention

3D hand-object pose estimation is the key to the success of many computer vision applications. The main focus of this task is to effectively model the interaction between the hand and an object. To this end, existing works either rely on…

Computer Vision and Pattern Recognition · Computer Science 2023-01-09 Rong Wang , Wei Mao , Hongdong Li

T6D-Direct: Transformers for Multi-Object 6D Pose Direct Regression

6D pose estimation is the task of predicting the translation and orientation of objects in a given input image, which is a crucial prerequisite for many robotics and augmented reality applications. Lately, the Transformer Network…

Computer Vision and Pattern Recognition · Computer Science 2021-09-24 Arash Amini , Arul Selvam Periyasamy , Sven Behnke

ConvPoseCNN2: Prediction and Refinement of Dense 6D Object Poses

Object pose estimation is a key perceptual capability in robotics. We propose a fully-convolutional extension of the PoseCNN method, which densely predicts object translations and orientations. This has several advantages such as improving…

Computer Vision and Pattern Recognition · Computer Science 2022-05-24 Arul Selvam Periyasamy , Catherine Capellen , Max Schwarz , Sven Behnke

Moving object detection from multi-depth images with an attention-enhanced CNN

One of the greatest challenges for detecting moving objects in the solar system from wide-field survey data is determining whether a signal indicates a true object or is due to some other source, like noise. Object verification has relied…

Computer Vision and Pattern Recognition · Computer Science 2025-12-08 Masato Shibukawa , Fumi Yoshida , Toshifumi Yanagisawa , Takashi Ito , Hirohisa Kurosaki , Makoto Yoshikawa , Kohki Kamiya , Ji-an Jiang , Wesley Fraser , JJ Kavelaars , Susan Benecchi , Anne Verbiscer , Akira Hatakeyama , Hosei O , Naoya Ozaki

Attention Deficit is Ordered! Fooling Deformable Vision Transformers with Collaborative Adversarial Patches

The latest generation of transformer-based vision models has proven to be superior to Convolutional Neural Network (CNN)-based models across several vision tasks, largely attributed to their remarkable prowess in relation modeling.…

Computer Vision and Pattern Recognition · Computer Science 2023-12-29 Quazi Mishkatul Alam , Bilel Tarchoun , Ihsen Alouani , Nael Abu-Ghazaleh

MoreFusion: Multi-object Reasoning for 6D Pose Estimation from Volumetric Fusion

Robots and other smart devices need efficient object-based scene representations from their on-board vision systems to reason about contact, physics and occlusion. Recognized precise object models will play an important role alongside…

Computer Vision and Pattern Recognition · Computer Science 2020-04-10 Kentaro Wada , Edgar Sucar , Stephen James , Daniel Lenton , Andrew J. Davison

CVAM-Pose: Conditional Variational Autoencoder for Multi-Object Monocular Pose Estimation

Estimating rigid objects' poses is one of the fundamental problems in computer vision, with a range of applications across automation and augmented reality. Most existing approaches adopt one network per object class strategy, depend…

Computer Vision and Pattern Recognition · Computer Science 2024-10-14 Jianyu Zhao , Wei Quan , Bogdan J. Matuszewski

MODRL/D-AM: Multiobjective Deep Reinforcement Learning Algorithm Using Decomposition and Attention Model for Multiobjective Optimization

Recently, a deep reinforcement learning method is proposed to solve multiobjective optimization problem. In this method, the multiobjective optimization problem is decomposed to a number of single-objective optimization subproblems and all…

Neural and Evolutionary Computing · Computer Science 2020-02-14 Hong Wu , Jiahai Wang , Zizhen Zhang

Multi-manifold Attention for Vision Transformers

Vision Transformers are very popular nowadays due to their state-of-the-art performance in several computer vision tasks, such as image classification and action recognition. Although their performance has been greatly enhanced through…

Computer Vision and Pattern Recognition · Computer Science 2024-10-28 Dimitrios Konstantinidis , Ilias Papastratis , Kosmas Dimitropoulos , Petros Daras

The challenge of simultaneous object detection and pose estimation: a comparative study

Detecting objects and estimating their pose remains as one of the major challenges of the computer vision research community. There exists a compromise between localizing the objects and estimating their viewpoints. The detector ideally…

Computer Vision and Pattern Recognition · Computer Science 2018-10-08 Daniel Oñoro-Rubio , Roberto J. López-Sastre , Carolina Redondo-Cabrera , Pedro Gil-Jiménez

Guided Visual Attention Model Based on Interactions Between Top-down and Bottom-up Information for Robot Pose Prediction

Deep robot vision models are widely used for recognizing objects from camera images, but shows poor performance when detecting objects at untrained positions. Although such problem can be alleviated by training with large datasets, the…

Robotics · Computer Science 2022-10-26 Hyogo Hiruma , Hiroki Mori , Hiroshi Ito , Tetsuya Ogata

PAM:Point-wise Attention Module for 6D Object Pose Estimation

6D pose estimation refers to object recognition and estimation of 3D rotation and 3D translation. The key technology for estimating 6D pose is to estimate pose by extracting enough features to find pose in any environment. Previous methods…

Computer Vision and Pattern Recognition · Computer Science 2020-08-13 Myoungha Song , Jeongho Lee , Donghwan Kim

MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion

Cluttered bin-picking environments are challenging for pose estimation models. Despite the impressive progress enabled by deep learning, single-view RGB pose estimation models perform poorly in cluttered dynamic environments. Imbuing the…

Robotics · Computer Science 2026-02-02 Arul Selvam Periyasamy , Sven Behnke

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of…

Computer Vision and Pattern Recognition · Computer Science 2022-07-13 Ashutosh Agarwal , Chetan Arora