English
Related papers

Related papers: Text-Utilization for Encoder-dominated Speech Reco…

200 papers

Attention-based recurrent neural encoder-decoder models present an elegant solution to the automatic speech recognition problem. This approach folds the acoustic model, pronunciation model, and language model into a single network and…

Audio and Speech Processing · Electrical Eng. & Systems 2018-11-08 Shubham Toshniwal , Anjuli Kannan , Chung-Cheng Chiu , Yonghui Wu , Tara N Sainath , Karen Livescu

Collecting audio-text pairs is expensive; however, it is much easier to access text-only data. Unless using shallow fusion, end-to-end automatic speech recognition (ASR) models require architecture modifications or additional training…

Audio and Speech Processing · Electrical Eng. & Systems 2024-01-10 Emiru Tsunoo , Hayato Futami , Yosuke Kashiwagi , Siddhant Arora , Shinji Watanabe

Recent work on speech representation models jointly pre-trained with text has demonstrated the potential of improving speech representations by encoding speech and text in a shared space. In this paper, we leverage such shared…

Computation and Language · Computer Science 2023-10-10 Chung-Ming Chien , Mingjiamei Zhang , Ju-Chieh Chou , Karen Livescu

Large language models (LLMs) have achieved remarkable success in the field of natural language processing, enabling better human-computer interaction using natural language. However, the seamless integration of speech signals into LLMs has…

Audio and Speech Processing · Electrical Eng. & Systems 2023-10-03 Jian Wu , Yashesh Gaur , Zhuo Chen , Long Zhou , Yimeng Zhu , Tianrui Wang , Jinyu Li , Shujie Liu , Bo Ren , Linquan Liu , Yu Wu

We present a method for introducing a text encoder into pre-trained end-to-end speech translation systems. It enhances the ability of adapting one modality (i.e., source-language speech) to another (i.e., source-language text). Thus, the…

Audio and Speech Processing · Electrical Eng. & Systems 2022-12-06 Yuhao Zhang , Chen Xu , Bojie Hu , Chunliang Zhang , Tong Xiao , Jingbo Zhu

On-device speech recognition requires training models of different sizes for deploying on devices with various computational budgets. When building such different models, we can benefit from training them jointly to take advantage of the…

Computation and Language · Computer Science 2021-07-15 Varun Nagaraja , Yangyang Shi , Ganesh Venkatesh , Ozlem Kalinli , Michael L. Seltzer , Vikas Chandra

Pre-trained Transformer language models (LM) have become go-to text representation encoders. Prior research fine-tunes deep LMs to encode text sequences such as sentences and passages into single dense vector representations for efficient…

Computation and Language · Computer Science 2021-09-22 Luyu Gao , Jamie Callan

Speech-to-text translation has many potential applications for low-resource languages, but the typical approach of cascading speech recognition with machine translation is often impossible, since the transcripts needed to train a speech…

Computation and Language · Computer Science 2018-06-19 Sameer Bansal , Herman Kamper , Karen Livescu , Adam Lopez , Sharon Goldwater

Dense retrieval requires high-quality text sequence embeddings to support effective search in the representation space. Autoencoder-based language models are appealing in dense retrieval as they train the encoder to output high-quality…

Machine Learning · Computer Science 2021-09-17 Shuqi Lu , Di He , Chenyan Xiong , Guolin Ke , Waleed Malik , Zhicheng Dou , Paul Bennett , Tieyan Liu , Arnold Overwijk

This paper presents a simple method that allows to easily enhance textual pre-trained large language models with speech information, when fine-tuned for a specific classification task. A classical issue with the fusion of many embeddings…

Computation and Language · Computer Science 2026-04-07 Nicolas Calbucura , Jose Guillen , Valentin Barriere

Recent advances in speech-enabled language models have shown promising results in building intelligent voice assistants. However, most existing approaches rely on large-scale paired speech-text data and extensive computational resources,…

Computation and Language · Computer Science 2025-06-10 Taesoo Kim , Jong Hwan Ko

Text simplification (TS) aims to reduce the lexical and structural complexity of a text, while still retaining the semantic meaning. Current automatic TS techniques are limited to either lexical-level applications or manually defining a…

Computation and Language · Computer Science 2016-09-14 Tong Wang , Ping Chen , Kevin Amaral , Jipeng Qiang

Text simplification (TS) can be viewed as monolingual translation task, translating between text variations within a single language. Recent neural TS models draw on insights from neural machine translation to learn lexical simplification…

Computation and Language · Computer Science 2018-10-11 Jipeng Qiang

Direct speech-to-text translation systems encounter an important drawback in data scarcity. A common solution consists on pretraining the encoder on automatic speech recognition, hence losing efficiency in the training process. In this…

Computation and Language · Computer Science 2024-09-27 Belen Alastruey , Gerard I. Gállego , Marta R. Costa-jussà

Large language models have become extremely popular recently due to their ability to achieve strong performance on a variety of tasks, such as text generation and rewriting, but their size and computation cost make them difficult to access,…

Computation and Language · Computer Science 2026-01-08 Anthony Lamelas

Automated audio captioning aims to use natural language to describe the content of audio data. This paper presents an audio captioning system with an encoder-decoder architecture, where the decoder predicts words based on audio features…

Audio and Speech Processing · Electrical Eng. & Systems 2021-08-06 Xinhao Mei , Qiushi Huang , Xubo Liu , Gengyun Chen , Jingqian Wu , Yusong Wu , Jinzheng Zhao , Shengchen Li , Tom Ko , H Lilian Tang , Xi Shao , Mark D. Plumbley , Wenwu Wang

While large language models are primarily used on natural language tasks, they have also shown great promise when adapted to new modalities, e.g., for scientific machine learning tasks. Most proposed approaches for such cross-modal…

Machine Learning · Computer Science 2026-03-09 Paloma García-de-Herreros , Philipp Slusallek , Dietrich Klakow , Vagrant Gautam

Recently, Transformer-based encoder-decoder models have demonstrated strong performance in multilingual speech recognition. However, the decoder's autoregressive nature and large size introduce significant bottlenecks during inference.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-28 Yunkyu Lim , Jihwan Park , Hyung Yong Kim , Hanbin Lee , Byeong-Yeol Kim

Recent breakthroughs in deep learning often rely on representation learning and knowledge transfer. In recent years, unsupervised and self-supervised techniques for learning speech representation were developed to foster automatic speech…

Computation and Language · Computer Science 2021-12-15 Pierre Beckmann , Mikolaj Kegler , Milos Cernak

In this paper, we show that a simple self-supervised pre-trained audio model can achieve comparable inference efficiency to more complicated pre-trained models with speech transformer encoders. These speech transformers rely on mixing…

Sound · Computer Science 2024-02-09 Sungho Jeon , Ching-Feng Yeh , Hakan Inan , Wei-Ning Hsu , Rashi Rungta , Yashar Mehdad , Daniel Bikel
‹ Prev 1 2 3 10 Next ›