Related papers: AudioBERT: Audio Knowledge Augmented Language Mode…

AuditoryBench++: Can Language Models Understand Auditory Knowledge without Hearing?

Even without directly hearing sounds, humans can effortlessly reason about auditory properties, such as pitch, loudness, or sound-source associations, drawing on auditory commonsense. In contrast, language models often lack this capability,…

Computation and Language · Computer Science 2026-01-29 Hyunjong Ok , Suho Yoo , Hyeonjun Kim , Jaeho Lee

Wav-BERT: Cooperative Acoustic and Linguistic Representation Learning for Low-Resource Speech Recognition

Unifying acoustic and linguistic representation learning has become increasingly crucial to transfer the knowledge learned on the abundance of high-resource language data for low-resource speech recognition. Existing approaches simply…

Computation and Language · Computer Science 2021-10-12 Guolin Zheng , Yubei Xiao , Ke Gong , Pan Zhou , Xiaodan Liang , Liang Lin

DictBERT: Dictionary Description Knowledge Enhanced Language Model Pre-training via Contrastive Learning

Although pre-trained language models (PLMs) have achieved state-of-the-art performance on various natural language processing (NLP) tasks, they are shown to be lacking in knowledge when dealing with knowledge driven tasks. Despite the many…

Computation and Language · Computer Science 2022-08-02 Qianglong Chen , Feng-Lin Li , Guohai Xu , Ming Yan , Ji Zhang , Yin Zhang

AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations

Multi-modal learning in the audio-language domain has seen significant advancements in recent years. However, audio-language learning faces challenges due to limited and lower-quality data compared to image-language tasks. Existing…

Audio and Speech Processing · Electrical Eng. & Systems 2024-06-10 David Xu

DialogBERT: Discourse-Aware Response Generation via Learning to Recover and Rank Utterances

Recent advances in pre-trained language models have significantly improved neural response generation. However, existing methods usually view the dialogue context as a linear sequence of tokens and learn to generate the next word through…

Computation and Language · Computer Science 2021-12-14 Xiaodong Gu , Kang Min Yoo , Jung-Woo Ha

Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models

Language models pretrained on text-only corpora often struggle with tasks that require auditory commonsense knowledge. Previous work addresses this problem by augmenting the language model to retrieve knowledge from external audio…

Computation and Language · Computer Science 2025-06-10 Suho Yoo , Hyunjong Ok , Jaeho Lee

Knowledge Graph Fusion for Language Model Fine-tuning

Language Models such as BERT have grown in popularity due to their ability to be pre-trained and perform robustly on a wide range of Natural Language Processing tasks. Often seen as an evolution over traditional word embedding techniques,…

Computation and Language · Computer Science 2022-06-30 Nimesh Bhana , Terence L. van Zyl

ArchBERT: Bi-Modal Understanding of Neural Architectures and Natural Languages

Building multi-modal language models has been a trend in the recent years, where additional modalities such as image, video, speech, etc. are jointly learned along with natural languages (i.e., textual information). Despite the success of…

Computation and Language · Computer Science 2023-10-30 Mohammad Akbari , Saeed Ranjbar Alvar , Behnam Kamranian , Amin Banitalebi-Dehkordi , Yong Zhang

Audio ALBERT: A Lite BERT for Self-supervised Learning of Audio Representation

For self-supervised speech processing, it is crucial to use pretrained models as speech representation extractors. In recent works, increasing the size of the model has been utilized in acoustic model training in order to achieve better…

Audio and Speech Processing · Electrical Eng. & Systems 2021-05-04 Po-Han Chi , Pei-Hung Chung , Tsung-Han Wu , Chun-Cheng Hsieh , Yen-Hao Chen , Shang-Wen Li , Hung-yi Lee

Phoneme-BERT: Joint Language Modelling of Phoneme Sequence and ASR Transcript

Recent years have witnessed significant improvement in ASR systems to recognize spoken utterances. However, it is still a challenging task for noisy and out-of-domain data, where substitution and deletion errors are prevalent in the…

Audio and Speech Processing · Electrical Eng. & Systems 2021-06-17 Mukuntha Narayanan Sundararaman , Ayush Kumar , Jithendra Vepa

RecoBERT: A Catalog Language Model for Text-Based Recommendations

Language models that utilize extensive self-supervised pre-training from unlabeled text, have recently shown to significantly advance the state-of-the-art performance in a variety of language understanding tasks. However, it is yet unclear…

Information Retrieval · Computer Science 2020-09-29 Itzik Malkiel , Oren Barkan , Avi Caciularu , Noam Razin , Ori Katz , Noam Koenigstein

E-BERT: A Phrase and Product Knowledge Enhanced Language Model for E-commerce

Pre-trained language models such as BERT have achieved great success in a broad range of natural language processing tasks. However, BERT cannot well support E-commerce related tasks due to the lack of two levels of domain knowledge, i.e.,…

Computation and Language · Computer Science 2021-12-20 Denghui Zhang , Zixuan Yuan , Yanchi Liu , Fuzhen Zhuang , Haifeng Chen , Hui Xiong

VideoBERT: A Joint Model for Video and Language Representation Learning

Self-supervised learning has become increasingly important to leverage the abundance of unlabeled data available on platforms like YouTube. Whereas most existing approaches learn low-level representations, we propose a joint…

Computer Vision and Pattern Recognition · Computer Science 2019-09-13 Chen Sun , Austin Myers , Carl Vondrick , Kevin Murphy , Cordelia Schmid

Semantics-aware BERT for Language Understanding

The latest work on language representations carefully integrates contextualized features into language model training, which enables a series of success especially in various machine reading comprehension and natural language inference…

Computation and Language · Computer Science 2020-02-05 Zhuosheng Zhang , Yuwei Wu , Hai Zhao , Zuchao Li , Shuailiang Zhang , Xi Zhou , Xiang Zhou

SpeechBERT: An Audio-and-text Jointly Learned Language Model for End-to-end Spoken Question Answering

While various end-to-end models for spoken language understanding tasks have been explored recently, this paper is probably the first known attempt to challenge the very difficult task of end-to-end spoken question answering (SQA). Learning…

Computation and Language · Computer Science 2020-08-12 Yung-Sung Chuang , Chi-Liang Liu , Hung-Yi Lee , Lin-shan Lee

AraBERT: Transformer-based Model for Arabic Language Understanding

The Arabic language is a morphologically rich language with relatively few resources and a less explored syntax compared to English. Given these limitations, Arabic Natural Language Processing (NLP) tasks like Sentiment Analysis (SA), Named…

Computation and Language · Computer Science 2021-03-09 Wissam Antoun , Fady Baly , Hazem Hajj

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is…

Computation and Language · Computer Science 2020-10-22 Yihuan Mao , Yujing Wang , Chufan Wu , Chen Zhang , Yang Wang , Yaming Yang , Quanlu Zhang , Yunhai Tong , Jing Bai

Learning Lip-Based Audio-Visual Speaker Embeddings with AV-HuBERT

This paper investigates self-supervised pre-training for audio-visual speaker representation learning where a visual stream showing the speaker's mouth area is used alongside speech as inputs. Our study focuses on the Audio-Visual Hidden…

Audio and Speech Processing · Electrical Eng. & Systems 2022-07-18 Bowen Shi , Abdelrahman Mohamed , Wei-Ning Hsu

InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective

Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks. Recent studies, however, show that such BERT-based models are vulnerable facing the threats of textual adversarial…

Computation and Language · Computer Science 2021-03-23 Boxin Wang , Shuohang Wang , Yu Cheng , Zhe Gan , Ruoxi Jia , Bo Li , Jingjing Liu

DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the…

Computation and Language · Computer Science 2022-12-02 Zhengfu He , Tianxiang Sun , Kuanning Wang , Xuanjing Huang , Xipeng Qiu