English
Related papers

Related papers: Learning Visual Actions Using Multiple Verb-Only L…

200 papers

This work introduces verb-only representations for actions and interactions; the problem of describing similar motions (e.g. 'open door', 'open cupboard'), and distinguish differing ones (e.g. 'open door' vs 'open bottle') using verb-only…

Computer Vision and Pattern Recognition · Computer Science 2018-05-11 Michael Wray , Davide Moltisanti , Dima Damen

The recent success in human action recognition with deep learning methods mostly adopt the supervised learning paradigm, which requires significant amount of manually labeled data to achieve good performance. However, label collection is an…

Computer Vision and Pattern Recognition · Computer Science 2018-09-07 Junnan Li , Yongkang Wong , Qi Zhao , Mohan S. Kankanhalli

Precisely naming the action depicted in a video can be a challenging and oftentimes ambiguous task. In contrast to object instances represented as nouns (e.g. dog, cat, chair, etc.), in the case of actions, human annotators typically lack a…

Computer Vision and Pattern Recognition · Computer Science 2022-10-12 Kiyoon Kim , Davide Moltisanti , Oisin Mac Aodha , Laura Sevilla-Lara

Pre-trained vision-language models (VLMs) have enabled significant progress in open vocabulary computer vision tasks such as image classification, object detection and image segmentation. Some recent works have focused on extending VLMs to…

Computer Vision and Pattern Recognition · Computer Science 2025-10-14 Rohit Gupta , Mamshad Nayeem Rizve , Jayakrishnan Unnikrishnan , Ashish Tawari , Son Tran , Mubarak Shah , Benjamin Yao , Trishul Chilimbi

This work deviates from easy-to-define class boundaries for object interactions. For the task of object interaction recognition, often captured using an egocentric view, we show that semantic ambiguities in verbs and recognising…

Computer Vision and Pattern Recognition · Computer Science 2017-04-24 Michael Wray , Davide Moltisanti , Walterio Mayol-Cuevas , Dima Damen

For training a video-based action recognition model that accepts multi-view video, annotating frame-level labels is tedious and difficult. However, it is relatively easy to annotate sequence-level labels. This kind of coarse annotations are…

Computer Vision and Pattern Recognition · Computer Science 2024-03-20 Vijay John , Yasutomo Kawanishi

Multi-label multi-view action recognition aims to recognize multiple concurrent or sequential actions from untrimmed videos captured by multiple cameras. Existing work has focused on multi-view action recognition in a narrow area with…

Computer Vision and Pattern Recognition · Computer Science 2024-10-21 Trung Thanh Nguyen , Yasutomo Kawanishi , Takahiro Komamizu , Ichiro Ide

Since collecting and annotating data for spatio-temporal action detection is very expensive, there is a need to learn approaches with less supervision. Weakly supervised approaches do not require any bounding box annotations and can be…

Computer Vision and Pattern Recognition · Computer Science 2021-01-22 Sovan Biswas , Juergen Gall

We present a multiview pseudo-labeling approach to video learning, a novel framework that uses complementary views in the form of appearance and motion information for semi-supervised learning in video. The complementary views help obtain…

Computer Vision and Pattern Recognition · Computer Science 2021-04-02 Bo Xiong , Haoqi Fan , Kristen Grauman , Christoph Feichtenhofer

We present an approach to labeling short video clips with English verbs as event descriptions. A key distinguishing aspect of this work is that it labels videos with verbs that describe the spatiotemporal interaction between event…

Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality. In this work, we conduct a comparative analysis of…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Zhuowan Li , Cihang Xie , Benjamin Van Durme , Alan Yuille

This paper presents a novel approach to Single-Positive Multi-label Learning. In general multi-label learning, a model learns to predict multiple labels or categories for a single input image. This is in contrast with standard multi-class…

Computer Vision and Pattern Recognition · Computer Science 2023-10-25 Xin Xing , Zhexiao Xiong , Abby Stylianou , Srikumar Sastry , Liyu Gong , Nathan Jacobs

An increasing number of datasets contain multiple views, such as video, sound and automatic captions. A basic challenge in representation learning is how to leverage multiple views to learn better representations. This is further…

Machine Learning · Computer Science 2019-03-04 Nils Holzenberger , Shruti Palaskar , Pranava Madhyastha , Florian Metze , Raman Arora

We present a method to learn a representation for adverbs from instructional videos using weak supervision from the accompanying narrations. Key to our method is the fact that the visual representation of the adverb is highly dependant on…

Computer Vision and Pattern Recognition · Computer Science 2020-03-25 Hazel Doughty , Ivan Laptev , Walterio Mayol-Cuevas , Dima Damen

In this paper we introduce the problem of Visual Semantic Role Labeling: given an image we want to detect people doing actions and localize the objects of interaction. Classical approaches to action recognition either study the task of…

Computer Vision and Pattern Recognition · Computer Science 2015-05-19 Saurabh Gupta , Jitendra Malik

Audio-visual representation learning is an important task from the perspective of designing machines with the ability to understand complex events. To this end, we propose a novel multimodal framework that instantiates multiple instance…

Computer Vision and Pattern Recognition · Computer Science 2018-07-10 Sanjeel Parekh , Slim Essid , Alexey Ozerov , Ngoc Q. K. Duong , Patrick Pérez , Gaël Richard

Multimodal image-language transformers have achieved impressive results on a variety of tasks that rely on fine-tuning (e.g., visual question answering and image retrieval). We are interested in shedding light on the quality of their…

Computation and Language · Computer Science 2021-06-18 Lisa Anne Hendricks , Aida Nematzadeh

Although an object may appear in numerous contexts, we often describe it in a limited number of ways. Language allows us to abstract away visual variation to represent and communicate concepts. Building on this intuition, we propose an…

Computer Vision and Pattern Recognition · Computer Science 2023-03-30 Mohamed El Banani , Karan Desai , Justin Johnson

Representing the semantics of words is a long-standing problem for the natural language processing community. Most methods compute word semantics given their textual context in large corpora. More recently, researchers attempted to…

Computation and Language · Computer Science 2017-11-10 Éloi Zablocki , Benjamin Piwowarski , Laure Soulier , Patrick Gallinari

Cross-lingual self-supervised learning has been a growing research topic in the last few years. However, current works only explored the use of audio signals to create representations. In this work, we study cross-lingual self-supervised…

Computation and Language · Computer Science 2023-03-17 Andreas Zinonos , Alexandros Haliassos , Pingchuan Ma , Stavros Petridis , Maja Pantic
‹ Prev 1 2 3 10 Next ›