Related papers: Reformulation for Pretraining Data Augmentation

Data Augmentation for Image Classification using Generative AI

Scaling laws dictate that the performance of AI models is proportional to the amount of available data. Data augmentation is a promising solution to expanding the dataset size. Traditional approaches focused on augmentation using rotation,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Fazle Rahat , M Shifat Hossain , Md Rubel Ahmed , Sumit Kumar Jha , Rickard Ewetz

Improving Generalization in Meta-Learning via Meta-Gradient Augmentation

Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the…

Machine Learning · Computer Science 2023-06-16 Ren Wang , Haoliang Sun , Qi Wei , Xiushan Nie , Yuling Ma , Yilong Yin

SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains

Retrieval-augmented generation (RAG) enhances the question-answering (QA) abilities of large language models (LLMs) by integrating external knowledge. However, adapting general-purpose RAG systems to specialized fields such as science and…

Computation and Language · Computer Science 2025-01-28 Ran Xu , Hui Liu , Sreyashi Nag , Zhenwei Dai , Yaochen Xie , Xianfeng Tang , Chen Luo , Yang Li , Joyce C. Ho , Carl Yang , Qi He

Data augmentation for deep learning based accelerated MRI reconstruction with limited data

Deep neural networks have emerged as very successful tools for image restoration and reconstruction tasks. These networks are often trained end-to-end to directly reconstruct an image from a noisy or corrupted measurement of that image. To…

Image and Video Processing · Electrical Eng. & Systems 2021-06-30 Zalan Fabian , Reinhard Heckel , Mahdi Soltanolkotabi

MAIN-RAG: Multi-Agent Filtering Retrieval-Augmented Generation

Large Language Models (LLMs) are becoming essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information. Retrieval-Augmented Generation (RAG) addresses this issue by…

Computation and Language · Computer Science 2025-01-03 Chia-Yuan Chang , Zhimeng Jiang , Vineeth Rakesh , Menghai Pan , Chin-Chia Michael Yeh , Guanchu Wang , Mingzhi Hu , Zhichao Xu , Yan Zheng , Mahashweta Das , Na Zou

Data Augmentation and Hyperparameter Tuning for Low-Resource MFA

A continued issue for those working with computational tools and endangered and under-resourced languages is the lower accuracy of results for languages with smaller amounts of data. We attempt to ameliorate this issue by using data…

Computation and Language · Computer Science 2025-04-10 Alessio Tosolini , Claire Bowern

MKRAG: Medical Knowledge Retrieval Augmented Generation for Medical Question Answering

Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks such as medical question answering (QA). In addition, LLMs tend to function as "black-boxes", making it challenging to modify…

Computation and Language · Computer Science 2024-08-19 Yucheng Shi , Shaochen Xu , Tianze Yang , Zhengliang Liu , Tianming Liu , Quanzheng Li , Xiang Li , Ninghao Liu

M-RAG: Reinforcing Large Language Model Performance through Retrieval-Augmented Generation with Multiple Partitions

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant memories from an external database. However, existing RAG methods typically organize all memories in a whole database, potentially limiting…

Computation and Language · Computer Science 2024-05-28 Zheng Wang , Shu Xian Teo , Jieer Ouyang , Yongjun Xu , Wei Shi

Towards Optimizing a Retrieval Augmented Generation using Large Language Model on Academic Data

Given the growing trend of many organizations integrating Retrieval Augmented Generation (RAG) into their operations, we assess RAG on domain-specific data and test state-of-the-art models across various optimization techniques. We…

Artificial Intelligence · Computer Science 2024-11-14 Anum Afzal , Juraj Vladika , Gentrit Fazlija , Andrei Staradubets , Florian Matthes

Revisiting Replay and Gradient Alignment for Continual Pre-Training of Large Language Models

Training large language models (LLMs) typically involves pre-training on massive corpora, only to restart the process entirely when new data becomes available. A more efficient and resource-conserving approach would be continual…

Machine Learning · Computer Science 2025-08-05 Istabrak Abbes , Gopeshh Subbaraj , Matthew Riemer , Nizar Islah , Benjamin Therien , Tsuguchika Tabaru , Hiroaki Kingetsu , Sarath Chandar , Irina Rish

Meta Knowledge for Retrieval Augmented Large Language Models

Retrieval Augmented Generation (RAG) is a technique used to augment Large Language Models (LLMs) with contextually relevant, time-critical, or domain-specific information without altering the underlying model parameters. However,…

Information Retrieval · Computer Science 2024-08-20 Laurent Mombaerts , Terry Ding , Adi Banerjee , Florian Felice , Jonathan Taws , Tarik Borogovac

How Can Input Reformulation Improve Tool Usage Accuracy in a Complex Dynamic Environment? A Study on $\tau$-bench

Recent advances in reasoning and planning capabilities of large language models (LLMs) have enabled their potential as autonomous agents capable of tool use in dynamic environments. However, in multi-turn conversational environments like…

Computation and Language · Computer Science 2025-09-03 Venkatesh Mishra , Amir Saeidi , Satyam Raj , Mutsumi Nakamura , Jayanth Srinivasa , Gaowen Liu , Ali Payani , Chitta Baral

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs.…

Computation and Language · Computer Science 2021-09-14 Kun Zhou , Wayne Xin Zhao , Sirui Wang , Fuzheng Zhang , Wei Wu , Ji-Rong Wen

On the Diminishing Returns of Complex Robust RAG Training in the Era of Powerful LLMs

Retrieval-augmented generation (RAG) systems traditionally employ sophisticated training strategies to enhance robustness against retrieval noise. In this work, we investigate a critical question: does the benefit of these complex robust…

Computation and Language · Computer Science 2025-10-06 Hanxing Ding , Shuchang Tao , Liang Pang , Zihao Wei , Liwei Chen , Kun Xu , Huawei Shen , Xueqi Cheng

Benchmarking Large Language Models in Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different…

Computation and Language · Computer Science 2023-12-21 Jiawei Chen , Hongyu Lin , Xianpei Han , Le Sun

Data Augmentation for Neural Machine Translation using Generative Language Model

Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by…

Computation and Language · Computer Science 2023-11-14 Seokjin Oh , Su Ah Lee , Woohwan Jung

A Systematic Study of Retrieval Pipeline Design for Retrieval-Augmented Medical Question Answering

Large language models (LLMs) have demonstrated strong capabilities in medical question answering; however, purely parametric models often suffer from knowledge gaps and limited factual grounding. Retrieval-augmented generation (RAG)…

Computation and Language · Computer Science 2026-04-09 Nusrat Sultana , Abdullah Muhammad Moosa , Kazi Afzalur Rahman , Sajal Chandra Banik

From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation…

Computation and Language · Computer Science 2026-03-25 Wenhao Wu , Zhentao Tang , Yafu Li , Shixiong Kai , Mingxuan Yuan , Chunlin Chen , Zhi Wang

RQ-RAG: Learning to Refine Queries for Retrieval Augmented Generation

Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in…

Computation and Language · Computer Science 2024-04-02 Chi-Min Chan , Chunpu Xu , Ruibin Yuan , Hongyin Luo , Wei Xue , Yike Guo , Jie Fu

DACL-RAG: Data Augmentation Strategy with Curriculum Learning for Retrieval-Augmented Generation

Retrieval-Augmented Generation (RAG) is an effective method to enhance the capabilities of large language models (LLMs). Existing methods typically optimize the retriever or the generator in a RAG system by directly using the top-k…

Computation and Language · Computer Science 2025-10-07 Shaohan Wang , Licheng Zhang , Zheren Fu , Zhendong Mao , Yongdong Zhang