Related papers: Training development for multisensory data analysi…

Learning in Audio-visual Context: A Review, Analysis, and New Perspective

Sight and hearing are two senses that play a vital role in human communication and scene understanding. To mimic human perception ability, audio-visual learning, aimed at developing computational approaches to learn from both audio and…

Computer Vision and Pattern Recognition · Computer Science 2022-08-23 Yake Wei , Di Hu , Yapeng Tian , Xuelong Li

A Vision for Multisensory Intelligence: Sensing, Science, and Synergy

Our experience of the world is multisensory, spanning a synthesis of language, sight, sound, touch, taste, and smell. Yet, artificial intelligence has primarily advanced in digital modalities like text, vision, and audio. This paper…

Machine Learning · Computer Science 2026-01-14 Paul Pu Liang

Sound training platform applied to astronomy

The convergence between astronomy and data sonification represents a significant advancement in the approach and analysis of cosmic information. By surpassing the visual exclusivity in data analysis in astronomy, innovative projects have…

Instrumentation and Methods for Astrophysics · Physics 2024-05-13 Natasha Bertaina Lucero , Johanna Casado , Beatriz García , Gonzalo Cayo

Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives

Humans use multiple senses to comprehend the environment. Vision and language are two of the most vital senses since they allow us to easily communicate our thoughts and perceive the world around us. There has been a lot of interest in…

Computation and Language · Computer Science 2026-05-13 Thong Nguyen , Yi Bin , Junbin Xiao , Leigang Qu , Yicong Li , Jay Zhangjie Wu , Cong-Duy Nguyen , See-Kiong Ng , Luu Anh Tuan

Binaural SoundNet: Predicting Semantics, Depth and Motion with Binaural Sounds

Humans can robustly recognize and localize objects by using visual and/or auditory cues. While machines are able to do the same with visual data already, less work has been done with sounds. This work develops an approach for scene…

Sound · Computer Science 2022-03-01 Dengxin Dai , Arun Balajee Vasudevan , Jiri Matas , Luc Van Gool

Sound and Noise: Proposal for an Interdisciplinary Learning Path

A learning path is proposed starting from the characterization of a sound wave, showing how human beings emit articulate sounds in the language, introducing psychoacoustics, i.e. how the sound interacts with ears and it is transduced into…

Physics Education · Physics 2016-01-08 Vera Montalbano

Improving Visual Recognition using Ambient Sound for Supervision

Our brains combine vision and hearing to create a more elaborate interpretation of the world. When the visual input is insufficient, a rich panoply of sounds can be used to describe our surroundings. Since more than 1,000 hours of videos…

Computer Vision and Pattern Recognition · Computer Science 2019-12-30 Rohan Mahadev , Hongyu Lu

Multi-encoder attention-based architectures for sound recognition with partial visual assistance

Large-scale sound recognition data sets typically consist of acoustic recordings obtained from multimedia libraries. As a consequence, modalities other than audio can often be exploited to improve the outputs of models designed for…

Audio and Speech Processing · Electrical Eng. & Systems 2022-10-11 Wim Boes , Hugo Van hamme

Data is Overrated: Perceptual Metrics Can Lead Learning in the Absence of Training Data

Perceptual metrics are traditionally used to evaluate the quality of natural signals, such as images and audio. They are designed to mimic the perceptual behaviour of human observers and usually reflect structures found in natural signals.…

Sound · Computer Science 2023-12-07 Tashi Namgyal , Alexander Hepburn , Raul Santos-Rodriguez , Valero Laparra , Jesus Malo

Multi-task Learning with Metadata for Music Mood Classification

Mood recognition is an important problem in music informatics and has key applications in music discovery and recommendation. These applications have become even more relevant with the rise of music streaming. Our work investigates the…

Sound · Computer Science 2021-10-12 Rajnish Kumar , Manjeet Dahiya

A Survey on Bayesian Deep Learning

A comprehensive artificial intelligence system needs to not only perceive the environment with different `senses' (e.g., seeing and hearing) but also infer the world's conditional (or even causal) relations and corresponding uncertainty.…

Machine Learning · Statistics 2021-01-07 Hao Wang , Dit-Yan Yeung

An Empirical Study and Improvement for Speech Emotion Recognition

Multimodal speech emotion recognition aims to detect speakers' emotions from audio and text. Prior works mainly focus on exploiting advanced networks to model and fuse different modality information to facilitate performance, while…

Computation and Language · Computer Science 2023-04-11 Zhen Wu , Yizhe Lu , Xinyu Dai

Vision+X: A Survey on Multimodal Learning in the Light of Data

We are perceiving and communicating with the world in a multisensory manner, where different information sources are sophisticatedly processed and interpreted by separate parts of the human brain to constitute a complex, yet harmonious and…

Computer Vision and Pattern Recognition · Computer Science 2024-06-12 Ye Zhu , Yu Wu , Nicu Sebe , Yan Yan

Concurrent Haptic, Audio, and Visual Data Set During Bare Finger Interaction with Textured Surfaces

Perceptual processes are frequently multi-modal. This is the case of haptic perception. Data sets of visual and haptic sensory signals have been compiled in the past, especially when it comes to the exploration of textured surfaces. These…

Robotics · Computer Science 2023-09-19 Alexis W. M. Devillard , Aruna Ramasamy , Damien Faux , Vincent Hayward , Etienne Burdet

Deep Learning of Human Perception in Audio Event Classification

In this paper, we introduce our recent studies on human perception in audio event classification by different deep learning models. In particular, the pre-trained model VGGish is used as feature extractor to process audio data, and DenseNet…

Sound · Computer Science 2018-09-10 Yi Yu , Samuel Beuret , Donghuo Zeng , Keizo Oyama

Unsupervised Discriminative Learning of Sounds for Audio Event Classification

Recent progress in network-based audio event classification has shown the benefit of pre-training models on visual data such as ImageNet. While this process allows knowledge transfer across different domains, training a model on large-scale…

Sound · Computer Science 2021-05-21 Sascha Hornauer , Ke Li , Stella X. Yu , Shabnam Ghaffarzadegan , Liu Ren

Exploring crossmodal perceptual enhancement and integration in a sequence-reproducing task with cognitive priming

Leveraging the perceptual phenomenon of crossmoal correspondence has been shown to facilitate peoples information processing and improves sensorimotor performance. However for goal-oriented interactive tasks, the question of how to enhance…

Human-Computer Interaction · Computer Science 2020-02-18 Feng Feng , Puhong Li , Tony Stockman

See, Hear, and Read: Deep Aligned Representations

We capitalize on large amounts of readily-available, synchronous data to learn a deep discriminative representations shared across three major natural modalities: vision, sound and language. By leveraging over a year of sound from video and…

Computer Vision and Pattern Recognition · Computer Science 2017-06-06 Yusuf Aytar , Carl Vondrick , Antonio Torralba

Understanding Pedestrian Movement Using Urban Sensing Technologies: The Promise of Audio-based Sensors

While various sensors have been deployed to monitor vehicular flows, sensing pedestrian movement is still nascent. Yet walking is a significant mode of travel in many cities, especially those in Europe, Africa, and Asia. Understanding…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-11 Chaeyeon Han , Pavan Seshadri , Yiwei Ding , Noah Posner , Bon Woo Koo , Animesh Agrawal , Alexander Lerch , Subhrajit Guhathakurta

Listen to the Image

Visual-to-auditory sensory substitution devices can assist the blind in sensing the visual environment by translating the visual information into a sound pattern. To improve the translation quality, the task performances of the blind are…

Computer Vision and Pattern Recognition · Computer Science 2019-04-22 Di Hu , Dong Wang , Xuelong Li , Feiping Nie , Qi Wang