Related papers: Unsupervised Data Validation Methods for Efficient…

A Survey on Recent Approaches for Natural Language Processing in Low-Resource Scenarios

Deep neural networks and huge language models are becoming omnipresent in natural language applications. As they are known for requiring large amounts of training data, there is a growing body of work to improve the performance in…

Computation and Language · Computer Science 2021-04-12 Michael A. Hedderich , Lukas Lange , Heike Adel , Jannik Strötgen , Dietrich Klakow

Foundation Models for Low-Resource Language Education (Vision Paper)

Recent studies show that large language models (LLMs) are powerful tools for working with natural language, bringing advances in many areas of computational linguistics. However, these models face challenges when applied to low-resource…

Computation and Language · Computer Science 2024-12-09 Zhaojun Ding , Zhengliang Liu , Hanqi Jiang , Yizhu Gao , Xiaoming Zhai , Tianming Liu , Ninghao Liu

Enhancing Code Generation for Low-Resource Languages: No Silver Bullet

The advent of Large Language Models (LLMs) has significantly advanced the field of automated code generation. LLMs rely on large and diverse datasets to learn syntax, semantics, and usage patterns of programming languages. For low-resource…

Software Engineering · Computer Science 2025-02-03 Alessandro Giagnorio , Alberto Martin-Lopez , Gabriele Bavota

Are Multilingual Language Models an Off-ramp for Under-resourced Languages? Will we arrive at Digital Language Equality in Europe in 2030?

Large language models (LLMs) demonstrate unprecedented capabilities and define the state of the art for almost all natural language processing (NLP) tasks and also for essentially all Language Technology (LT) applications. LLMs can only be…

Computation and Language · Computer Science 2025-02-19 Georg Rehm , Annika Grützner-Zahn , Fabio Barth

High-quality Data-to-Text Generation for Severely Under-Resourced Languages with Out-of-the-box Large Language Models

The performance of NLP methods for severely under-resourced languages cannot currently hope to match the state of the art in NLP methods for well resourced languages. We explore the extent to which pretrained large language models (LLMs)…

Computation and Language · Computer Science 2024-02-20 Michela Lorandi , Anya Belz

Semi-supervised Neural Machine Translation with Consistency Regularization for Low-Resource Languages

The advent of deep learning has led to a significant gain in machine translation. However, most of the studies required a large parallel dataset which is scarce and expensive to construct and even unavailable for some languages. This paper…

Computation and Language · Computer Science 2023-04-04 Viet H. Pham , Thang M. Pham , Giang Nguyen , Long Nguyen , Dien Dinh

Optimising Language Models for Downstream Tasks: A Post-Training Perspective

Language models (LMs) have demonstrated remarkable capabilities in NLP, yet adapting them efficiently and robustly to specific tasks remains challenging. As their scale and complexity grow, fine-tuning LMs on labelled data often…

Computation and Language · Computer Science 2025-06-27 Zhengyan Shi

Fine-tuning Large Language Models with Limited Data: A Survey and Practical Guide

Fine-tuning large language models (LLMs) with limited data poses a practical challenge in low-resource languages, specialized domains, and constrained deployment settings. While pre-trained LLMs provide strong foundations, effective…

Computation and Language · Computer Science 2025-10-29 Marton Szep , Daniel Rueckert , Rüdiger von Eisenhart-Rothe , Florian Hinterwimmer

A Survey on Efficient Large Language Model Training: From Data-centric Perspectives

Post-training of Large Language Models (LLMs) is crucial for unlocking their task generalization potential and domain-specific capabilities. However, the current LLM post-training paradigm faces significant data challenges, including the…

Computation and Language · Computer Science 2025-10-31 Junyu Luo , Bohan Wu , Xiao Luo , Zhiping Xiao , Yiqiao Jin , Rong-Cheng Tu , Nan Yin , Yifan Wang , Jingyang Yuan , Wei Ju , Ming Zhang

Achieving Peak Performance for Large Language Models: A Systematic Review

In recent years, large language models (LLMs) have achieved remarkable success in natural language processing (NLP). LLMs require an extreme amount of parameters to attain high performance. As models grow into the trillion-parameter range,…

Computation and Language · Computer Science 2024-09-10 Zhyar Rzgar K Rostam , Sándor Szénási , Gábor Kertész

Low-Resource Adaptation of Neural NLP Models

Real-world applications of natural language processing (NLP) are challenging. NLP models rely heavily on supervised machine learning and require large amounts of annotated data. These resources are often based on language data available in…

Computation and Language · Computer Science 2020-11-10 Farhad Nooralahzadeh

Efficient Strategy for Improving Large Language Model (LLM) Capabilities

Large Language Models (LLMs) have become a milestone in the field of artificial intelligence and natural language processing. However, their large-scale deployment remains constrained by the need for significant computational resources.…

Computation and Language · Computer Science 2025-08-07 Julián Camilo Velandia Gutiérrez

Learning Translation Quality Evaluation on Low Resource Languages from Large Language Models

Learned metrics such as BLEURT have in recent years become widely employed to evaluate the quality of machine translation systems. Training such metrics requires data which can be expensive and difficult to acquire, particularly for…

Computation and Language · Computer Science 2023-02-08 Amirkeivan Mohtashami , Mauro Verzetti , Paul K. Rubenstein

Comparative Analysis of Transfer Learning in Deep Learning Text-to-Speech Models on a Few-Shot, Low-Resource, Customized Dataset

Text-to-Speech (TTS) synthesis using deep learning relies on voice quality. Modern TTS models are advanced, but they need large amount of data. Given the growing computational complexity of these models and the scarcity of large,…

Sound · Computer Science 2023-10-10 Ze Liu

A multilingual training strategy for low resource Text to Speech

Recent speech technologies have led to produce high quality synthesised speech due to recent advances in neural Text to Speech (TTS). However, such TTS models depend on extensive amounts of data that can be costly to produce and is hardly…

Computation and Language · Computer Science 2024-09-04 Asma Amalas , Mounir Ghogho , Mohamed Chetouani , Rachid Oulad Haj Thami

Neural machine translation for low-resource languages

Neural machine translation (NMT) approaches have improved the state of the art in many machine translation settings over the last couple of years, but they require large amounts of training data to produce sensible output. We demonstrate…

Computation and Language · Computer Science 2017-08-22 Robert Östling , Jörg Tiedemann

Text Data Augmentation for Large Language Models: A Comprehensive Survey of Methods, Challenges, and Opportunities

The increasing size and complexity of pre-trained language models have demonstrated superior performance in many applications, but they usually require large training datasets to be adequately trained. Insufficient training sets could…

Computation and Language · Computer Science 2025-02-03 Yaping Chai , Haoran Xie , Joe S. Qin

Mitigating Data Scarcity for Large Language Models

In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm, achieving new benchmarks and state-of-the-art performances. These models often rely heavily on annotated data, which…

Computation and Language · Computer Science 2023-02-06 Hoang Van

Overcoming Data Scarcity in Generative Language Modelling for Low-Resource Languages: A Systematic Review

Generative language modelling has surged in popularity with the emergence of services such as ChatGPT and Google Gemini. While these models have demonstrated transformative potential in productivity and communication, they overwhelmingly…

Computation and Language · Computer Science 2025-07-09 Josh McGiff , Nikola S. Nikolov

Best Practices for Data-Efficient Modeling in NLG:How to Train Production-Ready Neural Models with Less Data

Natural language generation (NLG) is a critical component in conversational systems, owing to its role of formulating a correct and natural text response. Traditionally, NLG components have been deployed using template-based solutions.…

Computation and Language · Computer Science 2020-11-10 Ankit Arun , Soumya Batra , Vikas Bhardwaj , Ashwini Challa , Pinar Donmez , Peyman Heidari , Hakan Inan , Shashank Jain , Anuj Kumar , Shawn Mei , Karthik Mohan , Michael White