Related papers: Oral messages improve visual search

How really effective are Multimodal Hints in enhancing Visual Target Spotting? Some evidence from a usability study

The main aim of the work presented here is to contribute to computer science advances in the multimodal usability area, in-as-much as it addresses one of the major issues relating to the generation of effective oral system messages: how to…

Human-Computer Interaction · Computer Science 2007-08-28 Suzanne Kieffer , Noëlle Carbonell

Assistance orale \`a la recherche visuelle - \'etude exp\'erimentale de l'apport d'indications spatiales \`a la d\'etection de cibles

This paper describes an experimental study that aims at assessing the actual contribution of voice system messages to visual search efficiency and comfort. Messages which include spatial information on the target location are meant to…

Human-Computer Interaction · Computer Science 2007-10-04 Suzanne Kieffer , Noëlle Carbonell

Do oral messages help visual search?

A preliminary experimental study is presented, that aims at eliciting the contribution of oral messages to facilitating visual search tasks on crowded visual displays. Results of quantitative and qualitative analyses suggest that…

Human-Computer Interaction · Computer Science 2007-09-05 Noëlle Carbonell , Suzanne Kieffer

Touch? Speech? or Touch and Speech? Investigating Multimodal Interaction for Visual Network Exploration and Analysis

Interaction plays a vital role during visual network exploration as users need to engage with both elements in the view (e.g., nodes, links) and interface controls (e.g., sliders, dropdown menus). Particularly as the size and complexity of…

Human-Computer Interaction · Computer Science 2020-05-01 Ayshwarya Saktheeswaran , Arjun Srinivasan , John Stasko

Concurrent Crossmodal Feedback Assists Target-searching: Displaying Distance Information Through Visual, Auditory and Haptic Modalities

Humans sense of distance depends on the integration of multi sensory cues. The incoming visual luminance, auditory pitch and tactile vibration could all contribute to the ability of distance judgement. This ability can be enhanced if the…

Human-Computer Interaction · Computer Science 2020-02-18 Feng Feng , Tony Stockman

Eliciting Multimodal Gesture+Speech Interactions in a Multi-Object Augmented Reality Environment

As augmented reality technology and hardware become more mature and affordable, researchers have been exploring more intuitive and discoverable interaction techniques for immersive environments. In this paper, we investigate multimodal…

Human-Computer Interaction · Computer Science 2022-12-02 Xiaoyan Zhou , Adam S. Williams , Francisco R. Ortega

Efficient Multi-Modal Embeddings from Structured Data

Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual…

Computation and Language · Computer Science 2021-10-07 Anita L. Verő , Ann Copestake

Object Referring in Visual Scene with Spoken Language

Object referring has important applications, especially for human-machine interaction. While having received great attention, the task is mainly attacked with written language (text) as input rather than spoken language (speech), which is…

Computer Vision and Pattern Recognition · Computer Science 2017-12-06 Arun Balajee Vasudevan , Dengxin Dai , Luc Van Gool

Multimodal Interfaces for Effective Teleoperation

Research in multi-modal interfaces aims to provide solutions to immersion and increase overall human performance. A promising direction is combining auditory, visual and haptic interaction between the user and the simulated environment.…

Human-Computer Interaction · Computer Science 2020-04-01 Eleftherios Triantafyllidis , Christopher McGreavy , Jiacheng Gu , Zhibin Li

Analyzing Utility of Visual Context in Multimodal Speech Recognition Under Noisy Conditions

Multimodal learning allows us to leverage information from multiple sources (visual, acoustic and text), similar to our experience of the real world. However, it is currently unclear to what extent auxiliary modalities improve performance…

Computation and Language · Computer Science 2020-01-01 Tejas Srinivasan , Ramon Sanabria , Florian Metze

Building Goal-Oriented Dialogue Systems with Situated Visual Context

Most popular goal-oriented dialogue agents are capable of understanding the conversational context. However, with the surge of virtual assistants with screen, the next generation of agents are required to also understand screen context in…

Machine Learning · Computer Science 2021-11-26 Sanchit Agarwal , Jan Jezabek , Arijit Biswas , Emre Barut , Shuyang Gao , Tagyoung Chung

Large Language Model-assisted Speech and Pointing Benefits Multiple 3D Object Selection in Virtual Reality

Selection of occluded objects is a challenging problem in virtual reality, even more so if multiple objects are involved. With the advent of new artificial intelligence technologies, we explore the possibility of leveraging large language…

Human-Computer Interaction · Computer Science 2024-10-29 Junlong Chen , Jens Grubert , Per Ola Kristensson

Mixing Modes: Active and Passive Integration of Speech, Text, and Visualization for Communicating Data Uncertainty

Interpreting uncertain data can be difficult, particularly if the data presentation is complex. We investigate the efficacy of different modalities for representing data and how to combine the strengths of each modality to facilitate the…

Human-Computer Interaction · Computer Science 2024-04-15 Chase Stokes , Chelsea Sanker , Bridget Cogley , Vidya Setlur

Impact of Multimodal and Conversational AI on Learning Outcomes and Experience

Multimodal Large Language Models (MLLMs) offer an opportunity to support multimedia learning through conversational systems grounded in educational content. However, while conversational AI is known to boost engagement, its impact on…

Human-Computer Interaction · Computer Science 2026-04-03 Karan Taneja , Anjali Singh , Ashok K. Goel

Leveraging Speech for Gesture Detection in Multimodal Communication

Gestures are inherent to human interaction and often complement speech in face-to-face communication, forming a multimodal communication system. An important task in gesture analysis is detecting a gesture's beginning and end. Research on…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Esam Ghaleb , Ilya Burenko , Marlou Rasenberg , Wim Pouw , Ivan Toni , Peter Uhrig , Anna Wilson , Judith Holler , Aslı Özyürek , Raquel Fernández

Multi-Modal Gaze Following in Conversational Scenarios

Gaze following estimates gaze targets of in-scene person by understanding human behavior and scene information. Existing methods usually analyze scene images for gaze following. However, compared with visual images, audio also provides…

Computer Vision and Pattern Recognition · Computer Science 2024-01-17 Yuqi Hou , Zhongqun Zhang , Nora Horanyi , Jaewon Moon , Yihua Cheng , Hyung Jin Chang

Multimodal Machine Translation through Visuals and Speech

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area…

Computation and Language · Computer Science 2019-12-02 Umut Sulubacak , Ozan Caglayan , Stig-Arne Grönroos , Aku Rouhe , Desmond Elliott , Lucia Specia , Jörg Tiedemann

Progressive Sentences: Combining the Benefits of Word and Sentence Learning

The rapid evolution of lightweight consumer augmented reality (AR) smart glasses (a.k.a. optical see-through head-mounted displays) offers novel opportunities for learning, particularly through their unique capability to deliver multimodal…

Human-Computer Interaction · Computer Science 2025-07-22 Nuwan Janaka , Shengdong Zhao , Ashwin Ram , Ruoxin Sun , Sherisse Tan Jing Wen , Danae Li , David Hsu

Evaluation of Audio-Visual Alignments in Visually Grounded Speech Models

Systems that can find correspondences between multiple modalities, such as between speech and images, have great potential to solve different recognition and data analysis tasks in an unsupervised manner. This work studies multimodal…

Computer Vision and Pattern Recognition · Computer Science 2024-03-08 Khazar Khorrami , Okko Räsänen

Audio-visual speech separation based on joint feature representation with cross-modal attention

Multi-modal based speech separation has exhibited a specific advantage on isolating the target character in multi-talker noisy environments. Unfortunately, most of current separation strategies prefer a straightforward fusion based on…

Sound · Computer Science 2022-03-08 Junwen Xiong , Peng Zhang , Lei Xie , Wei Huang , Yufei Zha , Yanning Zhang