Related papers: EffiPerception: an Efficient Framework for Various…
Model efficiency has become increasingly important in computer vision. In this paper, we systematically study neural network architecture design choices for object detection and propose several key optimizations to improve efficiency.…
Despite the rapid advancement in the field of image recognition, the processing of high-resolution imagery remains a computational challenge. However, this processing is pivotal for extracting detailed object insights in areas ranging from…
The past few years have witnessed the rapid development of vision-centric 3D perception in autonomous driving. Although the 3D perception models share many structural and conceptual similarities, there still exist gaps in their feature…
As neural networks are increasingly being applied to real-world applications, mechanisms to address distributional shift and sequential task learning without forgetting are critical. Methods incorporating network expansion have shown…
This paper addresses the challenging unsupervised scene flow estimation problem by jointly learning four low-level vision sub-tasks: optical flow $\textbf{F}$, stereo-depth $\textbf{D}$, camera pose $\textbf{P}$ and motion segmentation…
Cooperative perception aims to address the inherent limitations of single-vehicle autonomous driving systems through information exchange among multiple agents. Previous research has primarily focused on single-frame perception tasks.…
Transformers are widely used for solving tasks in natural language processing, computer vision, speech, and music domains. In this paper, we talk about the efficiency of transformers in terms of memory (the number of parameters),…
The superior performance of modern visual backbones usually comes with a costly training procedure. We contribute to this issue by generalizing the idea of curriculum learning beyond its original formulation, i.e., training models using…
Most deep learning methods for video frame interpolation consist of three main components: feature extraction, motion estimation, and image synthesis. Existing approaches are mainly distinguishable in terms of how these modules are…
EfficientPose is an impressive 3D object detection model. It has been demonstrated to be quick, scalable, and accurate, especially when considering that it uses only RGB inputs. In this paper we try to improve on EfficientPose by giving it…
In this work we propose 3D-FFS, a novel approach to make sensor fusion based 3D object detection networks significantly faster using a class of computationally inexpensive heuristics. Existing sensor fusion based networks generate 3D region…
High-resolution dense prediction enables many appealing real-world applications, such as computational photography, autonomous driving, etc. However, the vast computational cost makes deploying state-of-the-art high-resolution dense…
Recent work has shown that learned image compression strategies can outperform standard hand-crafted compression algorithms that have been developed over decades of intensive research on the rate-distortion trade-off. With growing…
Deploying multiple machine learning models on resource-constrained robotic platforms for different perception tasks often results in redundant computations, large memory footprints, and complex integration challenges. In response, this work…
Object detection is a basic computer vision task to loccalize and categorize objects in a given image. Most state-of-the-art detection methods utilize a fixed number of proposals as an intermediate representation of object candidates, which…
Monocular depth estimation (MDE) plays a pivotal role in various computer vision applications, such as robotics, augmented reality, and autonomous driving. Despite recent advancements, existing methods often fail to meet key requirements…
All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks. In this…
The goal of Feature Selection - comprising filter, wrapper, and embedded approaches - is to find the optimal feature subset for designated downstream tasks. Nevertheless, current feature selection methods are limited by: 1) the selection…
The superior performance of modern deep networks usually comes with a costly training procedure. This paper presents a new curriculum learning approach for the efficient training of visual backbones (e.g., vision Transformers). Our work is…
As the reliability of the robot's perception correlates with the number of integrated sensing modalities to tackle uncertainty, a practical solution to manage these sensors from different computers, operate them simultaneously, and maintain…