Related papers: How really effective are Multimodal Hints in enhan…

Oral messages improve visual search

Input multimodality combining speech and hand gestures has motivated numerous usability studies. Contrastingly, issues relating to the design and ergonomic evaluation of multimodal output messages combining speech with visual modalities…

Human-Computer Interaction · Computer Science 2007-09-05 Suzanne Kieffer , Noëlle Carbonell

Do oral messages help visual search?

A preliminary experimental study is presented, that aims at eliciting the contribution of oral messages to facilitating visual search tasks on crowded visual displays. Results of quantitative and qualitative analyses suggest that…

Human-Computer Interaction · Computer Science 2007-09-05 Noëlle Carbonell , Suzanne Kieffer

Assistance orale \`a la recherche visuelle - \'etude exp\'erimentale de l'apport d'indications spatiales \`a la d\'etection de cibles

This paper describes an experimental study that aims at assessing the actual contribution of voice system messages to visual search efficiency and comfort. Messages which include spatial information on the target location are meant to…

Human-Computer Interaction · Computer Science 2007-10-04 Suzanne Kieffer , Noëlle Carbonell

Concurrent Crossmodal Feedback Assists Target-searching: Displaying Distance Information Through Visual, Auditory and Haptic Modalities

Humans sense of distance depends on the integration of multi sensory cues. The incoming visual luminance, auditory pitch and tactile vibration could all contribute to the ability of distance judgement. This ability can be enhanced if the…

Human-Computer Interaction · Computer Science 2020-02-18 Feng Feng , Tony Stockman

Efficient Multi-Modal Embeddings from Structured Data

Multi-modal word semantics aims to enhance embeddings with perceptual input, assuming that human meaning representation is grounded in sensory experience. Most research focuses on evaluation involving direct visual input, however, visual…

Computation and Language · Computer Science 2021-10-07 Anita L. Verő , Ann Copestake

Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users

This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals. We conduct a user survey to identify adoption patterns and key challenges users face with such…

Human-Computer Interaction · Computer Science 2025-03-31 Antonia Karamolegkou , Malvina Nikandrou , Georgios Pantazopoulos , Danae Sanchez Villegas , Phillip Rust , Ruchira Dhar , Daniel Hershcovich , Anders Søgaard

Increasing the Efficiency of 6-DoF Visual Localization Using Multi-Modal Sensory Data

Localization is a key requirement for mobile robot autonomy and human-robot interaction. Vision-based localization is accurate and flexible, however, it incurs a high computational burden which limits its application on many…

Robotics · Computer Science 2016-12-30 Ronald Clark , Sen Wang , Hongkai Wen , Niki Trigoni , Andrew Markham

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

With the aim of promoting and understanding the multilingual version of image search, we leverage visual object detection and propose a model with diverse multi-head attention to learn grounded multilingual multimodal representations.…

Computation and Language · Computer Science 2019-10-02 Po-Yao Huang , Xiaojun Chang , Alexander Hauptmann

Multi-modal Visual Understanding with Prompts for Semantic Information Disentanglement of Image

Multi-modal visual understanding of images with prompts involves using various visual and textual cues to enhance the semantic understanding of images. This approach combines both vision and language processing to generate more accurate…

Computer Vision and Pattern Recognition · Computer Science 2023-05-17 Yuzhou Peng

Touch? Speech? or Touch and Speech? Investigating Multimodal Interaction for Visual Network Exploration and Analysis

Interaction plays a vital role during visual network exploration as users need to engage with both elements in the view (e.g., nodes, links) and interface controls (e.g., sliders, dropdown menus). Particularly as the size and complexity of…

Human-Computer Interaction · Computer Science 2020-05-01 Ayshwarya Saktheeswaran , Arjun Srinivasan , John Stasko

Investigating Multimodal Large Language Models to Support Usability Evaluation

Usability evaluation is an essential method to support the design of effective and intuitive user interfaces (UIs). However, it commonly relies on resource-intensive, expert-driven methods, which limit its accessibility, especially for…

Software Engineering · Computer Science 2026-04-13 Sebastian Lubos , Alexander Felfernig , Damian Garber , Gerhard Leitner , Julian Schwazer , Manuel Henrich

Expansion of Visual Hints for Improved Generalization in Stereo Matching

We introduce visual hints expansion for guiding stereo matching to improve generalization. Our work is motivated by the robustness of Visual Inertial Odometry (VIO) in computer vision and robotics, where a sparse and unevenly distributed…

Computer Vision and Pattern Recognition · Computer Science 2022-11-02 Andrea Pilzer , Yuxin Hou , Niki Loppi , Arno Solin , Juho Kannala

Multimodal Interfaces for Effective Teleoperation

Research in multi-modal interfaces aims to provide solutions to immersion and increase overall human performance. A promising direction is combining auditory, visual and haptic interaction between the user and the simulated environment.…

Human-Computer Interaction · Computer Science 2020-04-01 Eleftherios Triantafyllidis , Christopher McGreavy , Jiacheng Gu , Zhibin Li

See or Hear? Exploring the Effect of Visual and Audio Hints and Gaze-assisted Task Feedback for Visual Search Tasks in Augmented Reality

Augmented reality (AR) is emerging in visual search tasks for increasingly immersive interactions with virtual objects. We propose an AR approach providing visual and audio hints along with gaze-assisted instant post-task feedback for…

Human-Computer Interaction · Computer Science 2023-11-15 Yuchong Zhang , Adam Nowak , Yueming Xuan , Andrzej Romanowski , Morten Fjeld

Quantifying the visual concreteness of words and topics in multimodal datasets

Multimodal machine learning algorithms aim to learn visual-textual correspondences. Previous work suggests that concepts with concrete visual manifestations may be easier to learn than concepts with abstract ones. We give an algorithm for…

Computation and Language · Computer Science 2018-05-25 Jack Hessel , David Mimno , Lillian Lee

On Explaining Multimodal Hateful Meme Detection Models

Hateful meme detection is a new multimodal task that has gained significant traction in academic and industry research communities. Recently, researchers have applied pre-trained visual-linguistic models to perform the multimodal…

Computer Vision and Pattern Recognition · Computer Science 2022-04-07 Ming Shan Hee , Roy Ka-Wei Lee , Wen-Haw Chong

Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!

Humans possess multimodal literacy, allowing them to actively integrate information from various modalities to form reasoning. Faced with challenges like lexical ambiguity in text, we supplement this with other modalities, such as thumbnail…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Jiwan Chung , Seungwon Lim , Jaehyun Jeon , Seungbeen Lee , Youngjae Yu

Images in Language Space: Exploring the Suitability of Large Language Models for Vision & Language Tasks

Large language models have demonstrated robust performance on various language tasks using zero-shot or few-shot learning paradigms. While being actively researched, multimodal models that can additionally handle images as input have yet to…

Computation and Language · Computer Science 2023-05-24 Sherzod Hakimov , David Schlangen

Towards Optimizing OCR for Accessibility

Visual cues such as structure, emphasis, and icons play an important role in efficient information foraging by sighted individuals and make for a pleasurable reading experience. Blind, low-vision and other print-disabled individuals miss…

Computer Vision and Pattern Recognition · Computer Science 2022-06-27 Peya Mowar , Tanuja Ganu , Saikat Guha

Locatability and Locatability Robustness of Visual Variables in Single Target Localization

Finding a particular object in a display is important for viewers in many visualizations, for example, when reacting to brushing or to a highlighted object. This can be enabled by making the target object different in one of the visual…

Human-Computer Interaction · Computer Science 2026-01-29 Wei Wei , Miguel A. Nacenta , Michelle F. Miranda , Charles Perin