Related papers: Do oral messages help visual search?

Oral messages improve visual search

Input multimodality combining speech and hand gestures has motivated numerous usability studies. Contrastingly, issues relating to the design and ergonomic evaluation of multimodal output messages combining speech with visual modalities…

Human-Computer Interaction · Computer Science 2007-09-05 Suzanne Kieffer , Noëlle Carbonell

Assistance orale \`a la recherche visuelle - \'etude exp\'erimentale de l'apport d'indications spatiales \`a la d\'etection de cibles

This paper describes an experimental study that aims at assessing the actual contribution of voice system messages to visual search efficiency and comfort. Messages which include spatial information on the target location are meant to…

Human-Computer Interaction · Computer Science 2007-10-04 Suzanne Kieffer , Noëlle Carbonell

How really effective are Multimodal Hints in enhancing Visual Target Spotting? Some evidence from a usability study

The main aim of the work presented here is to contribute to computer science advances in the multimodal usability area, in-as-much as it addresses one of the major issues relating to the generation of effective oral system messages: how to…

Human-Computer Interaction · Computer Science 2007-08-28 Suzanne Kieffer , Noëlle Carbonell

Concurrent Crossmodal Feedback Assists Target-searching: Displaying Distance Information Through Visual, Auditory and Haptic Modalities

Humans sense of distance depends on the integration of multi sensory cues. The incoming visual luminance, auditory pitch and tactile vibration could all contribute to the ability of distance judgement. This ability can be enhanced if the…

Human-Computer Interaction · Computer Science 2020-02-18 Feng Feng , Tony Stockman

Discrete Messages Improve Communication Efficiency among Isolated Intelligent Agents

Individuals, despite having varied life experiences and learning processes, can communicate effectively through languages. This study aims to explore the efficiency of language as a communication medium. We put forth two specific…

Machine Learning · Computer Science 2024-10-21 Hang Chen , Yuchuan Jang , Weijie Zhou , Cristian Meo , Ziwei Chen , Dianbo Liu

On the Role of Visual Cues in Audiovisual Speech Enhancement

We present an introspection of an audiovisual speech enhancement model. In particular, we focus on interpreting how a neural audiovisual speech enhancement model uses visual cues to improve the quality of the target speech signal. We show…

Machine Learning · Computer Science 2021-02-26 Zakaria Aldeneh , Anushree Prasanna Kumar , Barry-John Theobald , Erik Marchi , Sachin Kajarekar , Devang Naik , Ahmed Hussen Abdelaziz

Do Images Clarify? A Study on the Effect of Images on Clarifying Questions in Conversational Search

Conversational search systems increasingly employ clarifying questions to refine user queries and improve the search experience. Previous studies have demonstrated the usefulness of text-based clarifying questions in enhancing both…

Computation and Language · Computer Science 2026-02-10 Clemencia Siro , Zahra Abbasiantaeb , Yifei Yuan , Mohammad Aliannejadi , Maarten de Rijke

Projecting Robot Intentions Through Visual Cues: Static vs. Dynamic Signaling

Augmented and mixed-reality techniques harbor a great potential for improving human-robot collaboration. Visual signals and cues may be projected to a human partner in order to explicitly communicate robot intentions and goals. However, it…

Robotics · Computer Science 2023-08-22 Shubham Sonawani , Yifan Zhou , Heni Ben Amor

Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality

Selection of occluded objects is a challenging problem in virtual reality, even more so if multiple objects are involved. With the advent of new artificial intelligence technologies, we explore the possibility of leveraging large language…

Human-Computer Interaction · Computer Science 2024-10-29 Junlong Chen , Jens Grubert , Per Ola Kristensson

Learning Articulated Motion Models from Visual and Lingual Signals

In order for robots to operate effectively in homes and workplaces, they must be able to manipulate the articulated objects common within environments built for and by humans. Previous work learns kinematic models that prescribe this…

Robotics · Computer Science 2016-07-04 Zhengyang Wu , Mohit Bansal , Matthew R. Walter

Multimodal Surrogates for Video Browsing

Three types of video surrogates - visual (keyframes), verbal (keywords/phrases), and combination of the two - were designed and studied in a qualitative investigation of user cognitive processes. The results favor the combined surrogates in…

Digital Libraries · Computer Science 2007-05-23 Wei Ding , Gary Marchionini , Dagobert Soergel

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

Multi-modal learning, particularly among imaging and linguistic modalities, has made amazing strides in many high-level fundamental visual understanding problems, ranging from language grounding to dense event captioning. However, much of…

Computer Vision and Pattern Recognition · Computer Science 2019-10-28 Tanzila Rahman , Bicheng Xu , Leonid Sigal

Multimodal Machine Translation through Visuals and Speech

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area…

Computation and Language · Computer Science 2019-12-02 Umut Sulubacak , Ozan Caglayan , Stig-Arne Grönroos , Aku Rouhe , Desmond Elliott , Lucia Specia , Jörg Tiedemann

Touch? Speech? or Touch and Speech? Investigating Multimodal Interaction for Visual Network Exploration and Analysis

Interaction plays a vital role during visual network exploration as users need to engage with both elements in the view (e.g., nodes, links) and interface controls (e.g., sliders, dropdown menus). Particularly as the size and complexity of…

Human-Computer Interaction · Computer Science 2020-05-01 Ayshwarya Saktheeswaran , Arjun Srinivasan , John Stasko

Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks

Large language models have demonstrated robust performance on various language tasks using zero-shot or few-shot learning paradigms. While being actively researched, multimodal models that can additionally handle images as input have yet to…

Computation and Language · Computer Science 2023-05-24 Sherzod Hakimov , David Schlangen

Object Referring in Visual Scene with Spoken Language

Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is…

Computer Vision and Pattern Recognition · Computer Science 2017-12-06 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Efficient Multi-Modal Embeddings from Structured Data

Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual…

Computation and Language · Computer Science 2021-10-07 Anita L. Verő , Ann Copestake

Multimodality and Attention Increase Alignment in Natural Language Prediction Between Humans and Computational Models

The potential of multimodal generative artificial intelligence (mAI) to replicate human grounded language understanding, including the pragmatic, context-rich aspects of communication, remains to be clarified. Humans are known to use…

Artificial Intelligence · Computer Science 2024-01-03 Viktor Kewenig , Andrew Lampinen , Samuel A. Nastase , Christopher Edwards , Quitterie Lacome DEstalenx , Akilles Rechardt , Jeremy I Skipper , Gabriella Vigliocco

Object-oriented Targets for Visual Navigation using Rich Semantic Representations

When searching for an object humans navigate through a scene using semantic information and spatial relationships. We look for an object using our knowledge of its attributes and relationships with other objects to infer the probable…

Computer Vision and Pattern Recognition · Computer Science 2018-12-18 Jean-Benoit Delbrouck , Stéphane Dupont

Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models

Despite the impressive advancements achieved through vision-and-language pretraining, it remains unclear whether this joint learning paradigm can help understand each individual modality. In this work, we conduct a comparative analysis of…

Computer Vision and Pattern Recognition · Computer Science 2024-01-31 Zhuowan Li , Cihang Xie , Benjamin Van Durme , Alan Yuille