English
Related papers

Related papers: Enhanced Voice Post Processing Using Voice Decoder…

200 papers

Recently, much advance has been made in image captioning, and an encoder-decoder framework has achieved outstanding performance for this task. In this paper, we propose an extension of the encoder-decoder framework by adding a component…

Computer Vision and Pattern Recognition · Computer Science 2018-04-04 Wenhao Jiang , Lin Ma , Xinpeng Chen , Hanwang Zhang , Wei Liu

Automated audio captioning aims to use natural language to describe the content of audio data. This paper presents an audio captioning system with an encoder-decoder architecture, where the decoder predicts words based on audio features…

Audio and Speech Processing · Electrical Eng. & Systems 2021-08-06 Xinhao Mei , Qiushi Huang , Xubo Liu , Gengyun Chen , Jingqian Wu , Yusong Wu , Jinzheng Zhao , Shengchen Li , Tom Ko , H Lilian Tang , Xi Shao , Mark D. Plumbley , Wenwu Wang

Speech enhancement (SE) and neural vocoding are traditionally viewed as separate tasks. In this work, we observe them under a common thread: the rank behavior of these processes. This observation prompts two key questions: \textit{Can a…

Sound · Computer Science 2025-01-24 Andong Li , Zhihang Sun , Fengyuan Hao , Xiaodong Li , Chengshi Zheng

Brain-computer interfaces (BCI) offer numerous human-centered application possibilities, particularly affecting people with neurological disorders. Text or speech decoding from brain activities is a relevant domain that could augment the…

Audio and Speech Processing · Electrical Eng. & Systems 2025-01-10 Jihwan Lee , Tiantian Feng , Aditya Kommineni , Sudarsana Reddy Kadiri , Shrikanth Narayanan

We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-06 Yuhao Zhang , Chen Xu , Bojie Hu , Chunliang Zhang , Tong Xiao , Jingbo Zhu

Identity, accent, style, and emotions are essential components of human speech. Voice conversion (VC) techniques process the speech signals of two input speakers and other modalities of auxiliary information such as prompts and emotion…

Audio and Speech Processing · Electrical Eng. & Systems 2025-12-09 Xining Song , Zhihua Wei , Rui Wang , Haixiao Hu , Yanxiang Chen , Meng Han

This paper presents a new voice conversion model capable of transforming both speaking and singing voices. It addresses key challenges in current systems, such as conveying emotions, managing pronunciation and accent changes, and…

Sound · Computer Science 2024-12-12 Sowmya Cheripally

While the use of deep neural networks has significantly boosted speaker recognition performance, it is still challenging to separate speakers in poor acoustic environments. Here speech enhancement methods have traditionally allowed improved…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-28 Yanpei Shi , Qiang Huang , Thomas Hain

High-quality speech corpora are essential foundations for most speech applications. However, such speech data are expensive and limited since they are collected in professional recording environments. In this work, we propose an…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-11 Haoyu Li , Yang Ai , Junichi Yamagishi

Single-channel speech enhancement is utilized in various tasks to mitigate the effect of interfering signals. Conventionally, to ensure the speech enhancement performs optimally, the speech enhancement has needed to be tuned for each task.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-07-11 Hiroshi Sato , Tsubasa Ochiai , Marc Delcroix , Takafumi Moriya , Takanori Ashihara , Ryo Masumura

Recently, voice conversion (VC) has been widely studied. Many VC systems use disentangle-based learning techniques to separate the speaker and the linguistic content information from a speech signal. Subsequently, they convert the voice by…

Audio and Speech Processing · Electrical Eng. & Systems 2020-11-03 Yen-Hao Chen , Da-Yi Wu , Tsung-Han Wu , Hung-yi Lee

Since the vocal component plays a crucial role in popular music, singing voice detection has been an active research topic in music information retrieval. Although several proposed algorithms have shown high performances, we argue that…

Sound · Computer Science 2018-06-05 Kyungyun Lee , Keunwoo Choi , Juhan Nam

Recent research has delved into speech enhancement (SE) approaches that leverage audio embeddings from pre-trained models, diverging from time-frequency masking or signal prediction techniques. This paper introduces an efficient and…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-16 Xingwei Sun , Heinrich Dinkel , Yadong Niu , Linzhang Wang , Junbo Zhang , Jian Luan

Audio-visual speech enhancement system is regarded as one of promising solutions for isolating and enhancing speech of desired speaker. Typical methods focus on predicting clean speech spectrum via a naive convolution neural network based…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-01 Xinmeng Xu , Yang Wang , Jie Jia , Binbin Chen , Dejun Li

Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre. In this paper, we propose approaches to improving accent conversion applicability, as well as quality. First…

Computation and Language · Computer Science 2020-05-20 Wenjie Li , Benlai Tang , Xiang Yin , Yushi Zhao , Wei Li , Kang Wang , Hao Huang , Yuxuan Wang , Zejun Ma

Separating a song into vocal and accompaniment components is an active research topic, and recent years witnessed an increased performance from supervised training using deep learning techniques. We propose to apply the visual information…

Sound · Computer Science 2021-07-02 Bochen Li , Yuxuan Wang , Zhiyao Duan

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources. Traditionally, these tasks have been…

Audio and Speech Processing · Electrical Eng. & Systems 2021-03-16 Daniel Michelsanti , Zheng-Hua Tan , Shi-Xiong Zhang , Yong Xu , Meng Yu , Dong Yu , Jesper Jensen

In this paper, we propose a method of utilizing aligned lyrics as additional information to improve the performance of singing voice separation. We have combined the highway network-based lyrics encoder into Open-unmix separation network…

Audio and Speech Processing · Electrical Eng. & Systems 2020-08-12 Chang-Bin Jeon , Hyeong-Seok Choi , Kyogu Lee

This work describes an interactive decoding method to improve the performance of visual speech recognition systems using user input to compensate for the inherent ambiguity of the task. Unlike most phoneme-to-word decoding pipelines, which…

Computation and Language · Computer Science 2021-07-05 Brendan Shillingford , Yannis Assael , Misha Denil

This paper introduces a novel voice conversion (VC) model, guided by text instructions such as "articulate slowly with a deep tone" or "speak in a cheerful boyish voice". Unlike traditional methods that rely on reference utterances to…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-17 Chun-Yi Kuan , Chen An Li , Tsu-Yuan Hsu , Tse-Yang Lin , Ho-Lam Chung , Kai-Wei Chang , Shuo-yiin Chang , Hung-yi Lee
‹ Prev 1 2 3 10 Next ›