English
Related papers

Related papers: Driver Activity Classification Using Generalizable…

200 papers

Recognizing the activities causing distraction in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically…

Computer Vision and Pattern Recognition · Computer Science 2024-03-22 Md Zahid Hasan , Jiajing Chen , Jiyang Wang , Mohammed Shaiqur Rahman , Ameya Joshi , Senem Velipasalar , Chinmay Hegde , Anuj Sharma , Soumik Sarkar

Vision-language models (VLMs) have recently emerged as powerful representation learning systems that align visual observations with natural language concepts, offering new opportunities for semantic reasoning in safety-critical autonomous…

Computer Vision and Pattern Recognition · Computer Science 2026-02-19 Ross Greer , Maitrayee Keskar , Angel Martinez-Sanchez , Parthib Roy , Shashank Shriram , Mohan Trivedi

Large-scale Vision Language Models (LVLMs) exhibit advanced capabilities in tasks that require visual information, including object detection. These capabilities have promising applications in various industrial domains, such as autonomous…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Haruki Sakajo , Hiroshi Takato , Hiroshi Tsutsui , Komei Soda , Hidetaka Kamigaito , Taro Watanabe

Autonomous driving systems face significant challenges in handling unpredictable edge-case scenarios, such as adversarial pedestrian movements, dangerous vehicle maneuvers, and sudden environmental changes. Current end-to-end driving models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-30 Dianwei Chen , Zifan Zhang , Lei Cheng , Yuchen Liu , Xianfeng Terry Yang

Real-time scene parsing is a fundamental feature for autonomous driving vehicles with multiple cameras. In this letter we demonstrate that sharing semantics between cameras with different perspectives and overlapped views can boost the…

Computer Vision and Pattern Recognition · Computer Science 2020-01-14 Zhenzhen Xiang , Anbo Bao , Jie Li , Jianbo Su

Vision-language models (VLMs) have become a promising approach to enhancing perception and decision-making in autonomous driving. The gap remains in applying VLMs to understand complex scenarios interacting with pedestrians and efficient…

Computer Vision and Pattern Recognition · Computer Science 2025-07-31 Haoxiang Gao , Li Zhang , Yu Zhao , Zhou Yang , Jinghan Cao

Vision-Language-Action (VLA) models have recently shown strong decision-making capabilities in autonomous driving. However, existing VLAs often struggle with achieving efficient inference and generalizing to novel autonomous vehicle…

Computer Vision and Pattern Recognition · Computer Science 2025-11-26 Dapeng Zhang , Zhenlong Yuan , Zhangquan Chen , Chih-Ting Liao , Yinda Chen , Fei Shen , Qingguo Zhou , Tat-Seng Chua

Learning visual representations from observing actions to benefit robot visuo-motor policy generation is a promising direction that closely resembles human cognitive function and perception. Motivated by this, and further inspired by…

Human action recognition plays a critical role in healthcare and medicine, supporting applications such as patient behavior monitoring, fall detection, surgical robot supervision, and procedural skill assessment. While traditional models…

Computer Vision and Pattern Recognition · Computer Science 2025-08-01 Utkarsh Shandilya , Marsha Mariya Kappan , Sanyam Jain , Vijeta Sharma

Vision-Language-Action (VLA) models have emerged as a promising framework for end-to-end autonomous driving. However, existing VLAs typically rely on sparse action supervision, which underutilizes their powerful scene understanding and…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Xiaodong Mei , Diankun Zhang , Hongwei Xie , Guang Chen , Hangjun Ye , Dan Xu

The use of Vision-Language Models (VLMs) in automated driving applications is becoming increasingly common, with the aim of leveraging their reasoning and generalisation capabilities to handle long tail scenarios. However, these models…

Computer Vision and Pattern Recognition · Computer Science 2026-03-09 Nikos Theodoridis , Reenu Mohandas , Ganesh Sistu , Anthony Scanlan , Ciarán Eising , Tim Brophy

Utilizing vision and language models (VLMs) pre-trained on large-scale image-text pairs is becoming a promising paradigm for open-vocabulary visual recognition. In this work, we extend this paradigm by leveraging motion and audio that…

Computer Vision and Pattern Recognition · Computer Science 2022-07-18 Rui Qian , Yeqing Li , Zheng Xu , Ming-Hsuan Yang , Serge Belongie , Yin Cui

Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. However, such representations tend to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-03 Jona Ruthardt , Manu Gaur , Deva Ramanan , Makarand Tapaswi , Yuki M. Asano

Accurate classification of autonomous vehicle (AV) driving behaviors is critical for safety validation, performance diagnosis, and traffic integration analysis. However, existing approaches primarily rely on numerical time-series modeling…

Artificial Intelligence · Computer Science 2026-03-04 Xiangyu Li , Tianyi Wang , Xi Cheng , Rakesh Chowdary Machineni , Zhaomiao Guo , Sikai Chen , Junfeng Jiao , Christian Claudel

Autonomous driving is a complex and challenging task that aims at safe motion planning through scene understanding and reasoning. While vision-only autonomous driving methods have recently achieved notable performance, through enhanced…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Chenbin Pan , Burhaneddin Yaman , Tommaso Nesti , Abhirup Mallik , Alessandro G Allievi , Senem Velipasalar , Liu Ren

Explainability is a longstanding challenge in deep learning, especially in high-stakes domains like healthcare. Common explainability methods highlight image regions that drive an AI model's decision. Humans, however, heavily rely on…

Artificial Intelligence · Computer Science 2023-11-21 Shobhit Agarwal , Yevgeniy R. Semenov , William Lotter

Large vision-language models (VLMs) have shown promising capabilities in scene understanding, enhancing the explainability of driving behaviors and interactivity with users. Existing methods primarily fine-tune VLMs on on-board multi-view…

Computer Vision and Pattern Recognition · Computer Science 2025-08-19 Nan Song , Bozhou Zhang , Xiatian Zhu , Jiankang Deng , Li Zhang

Ensuring safe transition of control in automated vehicles requires an accurate and timely assessment of driver readiness. This paper introduces Driver-Net, a novel deep learning framework that fuses multi-camera inputs to estimate driver…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Mahdi Rezaei , Mohsen Azarmi

Dashboard cameras capture a tremendous amount of driving scene video each day. These videos are purposefully coupled with vehicle sensing data, such as from the speedometer and inertial sensors, providing an additional sensing modality for…

Computer Vision and Pattern Recognition · Computer Science 2019-09-17 Seokju Lee , Junsik Kim , Tae-Hyun Oh , Yongseop Jeong , Donggeun Yoo , Stephen Lin , In So Kweon

The rapid progress of multimodal large language models (MLLM) has paved the way for Vision-Language-Action (VLA) paradigms, which integrate visual perception, natural language understanding, and control within a single policy. Researchers…

‹ Prev 1 2 3 10 Next ›