Related papers: Hypergraph-based Multi-View Action Recognition usi…
Event-based cameras are neuromorphic sensors capable of efficiently encoding visual information in the form of sparse sequences of events. Being biologically inspired, they are commonly used to exploit some of the computational and power…
Recognizing the motion of Micro Aerial Vehicles (MAVs) is crucial for enabling cooperative perception and control in autonomous aerial swarms. Yet, vision-based recognition models relying only on RGB data often fail to capture the complex…
Event cameras are innovative neuromorphic sensors that asynchronously capture the scene dynamics. Due to the event-triggering mechanism, such cameras record event streams with much shorter response latency and higher intensity sensitivity…
Event cameras encode visual information with high temporal precision, low data-rate, and high-dynamic range. Thanks to these characteristics, event cameras are particularly suited for scenarios with high motion, challenging lighting…
This chapter aims to aid the development of Cyber-Physical Systems (CPS) in automated understanding of events and activities in various applications of video-surveillance. These events are mostly captured by drones, CCTVs or novice and…
Neuromorphic sensors, specifically event cameras, revolutionize visual data acquisition by capturing pixel intensity changes with exceptional dynamic range, minimal latency, and energy efficiency, setting them apart from conventional…
Dynamic vision sensors (DVS) are bio-inspired devices that capture visual information in the form of asynchronous events, which encode changes in pixel intensity with high temporal resolution and low latency. These events provide rich…
Event-based cameras are becoming increasingly popular for their ability to capture high-speed motion with low latency and high dynamic range. However, generating videos from events remains challenging due to the highly sparse and varying…
Human action recognition remains an important yet challenging task. This work proposes a novel action recognition system. It uses a novel Multiple View Region Adaptive Multi-resolution in time Depth Motion Map (MV-RAMDMM) formulation…
The temporal segmentation of events is an essential task and a precursor for the automatic recognition of human actions in the video. Several attempts have been made to capture frame-level salient aspects through attention but they lack the…
Action recognition and anticipation are key to the success of many computer vision applications. Existing methods can roughly be grouped into those that extract global, context-aware representations of the entire image or sequence, and…
Multimodal LLMs have advanced vision-language tasks but still struggle with understanding video scenes. To bridge this gap, Video Scene Graph Generation (VidSGG) has emerged to capture multi-object relationships across video frames.…
We investigate architectures of discriminatively trained deep Convolutional Networks (ConvNets) for action recognition in video. The challenge is to capture the complementary information on appearance from still frames and motion between…
The main streams of human activity recognition (HAR) algorithms are developed based on RGB cameras which are suffered from illumination, fast motion, privacy-preserving, and large energy consumption. Meanwhile, the biologically inspired…
Nuanced understanding and the generation of detailed descriptive content for (bimanual) manipulation actions in videos is important for disciplines such as robotics, human-computer interaction, and video content analysis. This study…
The event-based Vision-Language Model (VLM) recently has made good progress for practical vision tasks. However, most of these works just utilize CLIP for focusing on traditional perception tasks, which obstruct model understanding…
Neuromorphic, or event, cameras represent a transformation in the classical approach to visual sensing encodes detected instantaneous per-pixel illumination changes into an asynchronous stream of event packets. Their novelty compared to…
Advancements in deep neural networks have contributed to near perfect results for many computer vision problems such as object recognition, face recognition and pose estimation. However, human action recognition is still far from…
Event cameras are bio-inspired sensors that capture the per-pixel intensity changes asynchronously and produce event streams encoding the time, pixel position, and polarity (sign) of the intensity changes. Event cameras possess a myriad of…
Event cameras produce asynchronous event streams that are spatially sparse yet temporally dense. Mainstream event representation learning algorithms typically use event frames, voxels, or tensors as input. Although these approaches have…