English
Related papers

Related papers: Data-Augmentation-Based Dialectal Adaptation for L…

200 papers

This paper explores the potential of leveraging Large Language Models (LLMs) for data augmentation in multilingual commonsense reasoning datasets where the available training data is extremely limited. To achieve this, we utilise several…

Computation and Language · Computer Science 2023-10-24 Chenxi Whitehouse , Monojit Choudhury , Alham Fikri Aji

Data augmentation is an essential technique in natural language processing (NLP) for enriching training datasets by generating diverse samples. This process is crucial for improving the robustness and generalization capabilities of NLP…

Computation and Language · Computer Science 2025-10-16 Zaitian Wang , Jinghan Zhang , Xinhao Zhang , Kunpeng Liu , Pengfei Wang , Yuanchun Zhou

In the rapidly evolving field of large language models (LLMs), data augmentation (DA) has emerged as a pivotal technique for enhancing model performance by diversifying training examples without the need for additional data collection. This…

Computation and Language · Computer Science 2024-07-03 Bosheng Ding , Chengwei Qin , Ruochen Zhao , Tianze Luo , Xinze Li , Guizhen Chen , Wenhan Xia , Junjie Hu , Anh Tuan Luu , Shafiq Joty

This paper explores the enhancement of small language models through strategic dataset augmentation via ChatGPT-3.5-Turbo, in the domain of Natural Language Inference (NLI). By employing knowledge distillation-based techniques and synthetic…

Computation and Language · Computer Science 2024-09-20 Tom Pieper , Mohamad Ballout , Ulf Krumnack , Gunther Heidemann , Kai-Uwe Kühnberger

Most of the world's languages and dialects are low-resource, and lack support in mainstream machine translation (MT) models. However, many of them have a closely-related high-resource language (HRL) neighbor, and differ in linguistically…

Computation and Language · Computer Science 2025-10-22 Niyati Bafna , Emily Chang , Nathaniel R. Robinson , David R. Mortensen , Kenton Murray , David Yarowsky , Hale Sirin

In the past five years, research has shifted from traditional Machine Learning (ML) and Deep Learning (DL) approaches to leveraging Large Language Models (LLMs) , including multimodality, for data augmentation to enhance generalization, and…

Computer Vision and Pattern Recognition · Computer Science 2025-03-25 Ranjan Sapkota , Shaina Raza , Maged Shoman , Achyut Paudel , Manoj Karkee

This study addresses the interaction challenges encountered by spoken dialogue systems (SDSs) when engaging with users who exhibit distinct conversational behaviors, particularly minors, in scenarios where data are scarce. We propose a…

Computation and Language · Computer Science 2024-08-21 Zhiyang Qi , Michimasa Inaba

Large Language Models (LLMs) are becoming increasingly multilingual, supporting hundreds of languages, especially high resource ones. Unfortunately, Dialect variations are still underrepresented due to limited data and linguistic variation.…

Computation and Language · Computer Science 2026-02-11 Abdulhai Alali , Abderrahmane Issam

This paper examines the effectiveness of Large Language Models (LLMs) in translating the low-resource Lebanese dialect, focusing on the impact of culturally authentic data versus larger translated datasets. We compare three fine-tuning…

Computation and Language · Computer Science 2025-05-02 Silvana Yakhni , Ali Chehab

Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. However, their increasing size poses challenges in terms of computational cost. On the other hand, Small Language Models (SLMs) are…

Computation and Language · Computer Science 2023-08-03 Zhen Guo , Peiqi Wang , Yanwei Wang , Shangdi Yu

Historically, researchers and consumers have noticed a decrease in quality when applying NLP tools to minority variants of languages (i.e. Puerto Rican Spanish or Swiss German), but studies exploring this have been limited to a select few…

Computation and Language · Computer Science 2023-10-24 Anjali Kantharuban , Ivan Vulić , Anna Korhonen

Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of…

Data augmentation is a critical component of deep learning pipelines, enhancing model generalization by increasing dataset diversity. Traditional augmentation strategies rely on manually designed transformations, stochastic sampling, or…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Ant Duru , Alptekin Temizel

Fine-tuning large language models (LLMs) using diverse datasets is crucial for enhancing their overall performance across various domains. In practical scenarios, existing methods based on modeling the mixture proportions of data…

Computation and Language · Computer Science 2025-10-31 Zhenqing Ling , Daoyuan Chen , Liuyi Yao , Qianli Shen , Yaliang Li , Ying Shen

Low-resource languages (LRLs) face significant challenges in natural language processing (NLP) due to limited data. While current state-of-the-art large language models (LLMs) still struggle with LRLs, smaller multilingual models (mLMs)…

Computation and Language · Computer Science 2025-02-17 Daniil Gurgurov , Ivan Vykopal , Josef van Genabith , Simon Ostermann

Large language models (LLMs) are reported to be partial to certain cultures owing to the training data dominance from the English corpora. Since multilingual cultural data are often expensive to collect, existing efforts handle this by…

Computation and Language · Computer Science 2024-12-04 Cheng Li , Mengzhou Chen , Jindong Wang , Sunayana Sitaram , Xing Xie

Large Language Models (LLMs) exhibit a puzzling disparity in their formal linguistic competence: while they learn some linguistic phenomena with near-perfect mastery, they often perform below chance on others, even after training on…

Computation and Language · Computer Science 2026-04-21 H S V N S Kowndinya Renduchintala , Sumit Bhatia

With the capabilities of understanding and executing natural language instructions, Large language models (LLMs) can potentially act as a powerful tool for textual data augmentation. However, the quality of augmented data depends heavily on…

Computation and Language · Computer Science 2024-04-30 Yichuan Li , Kaize Ding , Jianling Wang , Kyumin Lee

Data augmentation (DA) is crucial to mitigate model training instability and over-fitting problems in low-resource open-domain dialogue generation. However, traditional DA methods often neglect semantic data diversity, restricting the…

Computation and Language · Computer Science 2024-04-02 Zhenhua Liu , Tong Zhu , Jianxiang Xiang , Wenliang Chen

This paper proposes a novel linear prediction coding-based data aug-mentation method for children's low and zero resource dialect ASR. The data augmentation procedure consists of perturbing the formant peaks of the LPC spectrum during LPC…

Audio and Speech Processing · Electrical Eng. & Systems 2022-02-23 Alexander Johnson , Ruchao Fan , Robin Morris , Abeer Alwan
‹ Prev 1 2 3 10 Next ›