English
Related papers

Related papers: Reformulation for Pretraining Data Augmentation

200 papers

Scaling laws dictate that the performance of AI models is proportional to the amount of available data. Data augmentation is a promising solution to expanding the dataset size. Traditional approaches focused on augmentation using rotation,…

Computer Vision and Pattern Recognition · Computer Science 2024-09-04 Fazle Rahat , M Shifat Hossain , Md Rubel Ahmed , Sumit Kumar Jha , Rickard Ewetz

Meta-learning methods typically follow a two-loop framework, where each loop potentially suffers from notorious overfitting, hindering rapid adaptation and generalization to new tasks. Existing schemes solve it by enhancing the…

Machine Learning · Computer Science 2023-06-16 Ren Wang , Haoliang Sun , Qi Wei , Xiushan Nie , Yuling Ma , Yilong Yin

Retrieval-augmented generation (RAG) enhances the question-answering (QA) abilities of large language models (LLMs) by integrating external knowledge. However, adapting general-purpose RAG systems to specialized fields such as science and…

Computation and Language · Computer Science 2025-01-28 Ran Xu , Hui Liu , Sreyashi Nag , Zhenwei Dai , Yaochen Xie , Xianfeng Tang , Chen Luo , Yang Li , Joyce C. Ho , Carl Yang , Qi He

Deep neural networks have emerged as very successful tools for image restoration and reconstruction tasks. These networks are often trained end-to-end to directly reconstruct an image from a noisy or corrupted measurement of that image. To…

Image and Video Processing · Electrical Eng. & Systems 2021-06-30 Zalan Fabian , Reinhard Heckel , Mahdi Soltanolkotabi

Large Language Models (LLMs) are becoming essential tools for various natural language processing tasks but often suffer from generating outdated or incorrect information. Retrieval-Augmented Generation (RAG) addresses this issue by…

A continued issue for those working with computational tools and endangered and under-resourced languages is the lower accuracy of results for languages with smaller amounts of data. We attempt to ameliorate this issue by using data…

Computation and Language · Computer Science 2025-04-10 Alessio Tosolini , Claire Bowern

Large Language Models (LLMs), although powerful in general domains, often perform poorly on domain-specific tasks such as medical question answering (QA). In addition, LLMs tend to function as "black-boxes", making it challenging to modify…

Computation and Language · Computer Science 2024-08-19 Yucheng Shi , Shaochen Xu , Tianze Yang , Zhengliang Liu , Tianming Liu , Quanzheng Li , Xiang Li , Ninghao Liu

Retrieval-Augmented Generation (RAG) enhances Large Language Models (LLMs) by retrieving relevant memories from an external database. However, existing RAG methods typically organize all memories in a whole database, potentially limiting…

Computation and Language · Computer Science 2024-05-28 Zheng Wang , Shu Xian Teo , Jieer Ouyang , Yongjun Xu , Wei Shi

Given the growing trend of many organizations integrating Retrieval Augmented Generation (RAG) into their operations, we assess RAG on domain-specific data and test state-of-the-art models across various optimization techniques. We…

Artificial Intelligence · Computer Science 2024-11-14 Anum Afzal , Juraj Vladika , Gentrit Fazlija , Andrei Staradubets , Florian Matthes

Training large language models (LLMs) typically involves pre-training on massive corpora, only to restart the process entirely when new data becomes available. A more efficient and resource-conserving approach would be continual…

Retrieval Augmented Generation (RAG) is a technique used to augment Large Language Models (LLMs) with contextually relevant, time-critical, or domain-specific information without altering the underlying model parameters. However,…

Information Retrieval · Computer Science 2024-08-20 Laurent Mombaerts , Terry Ding , Adi Banerjee , Florian Felice , Jonathan Taws , Tarik Borogovac

Recent advances in reasoning and planning capabilities of large language models (LLMs) have enabled their potential as autonomous agents capable of tool use in dynamic environments. However, in multi-turn conversational environments like…

Computation and Language · Computer Science 2025-09-03 Venkatesh Mishra , Amir Saeidi , Satyam Raj , Mutsumi Nakamura , Jayanth Srinivasa , Gaowen Liu , Ali Payani , Chitta Baral

Recent works have shown that powerful pre-trained language models (PLM) can be fooled by small perturbations or intentional attacks. To solve this issue, various data augmentation techniques are proposed to improve the robustness of PLMs.…

Computation and Language · Computer Science 2021-09-14 Kun Zhou , Wayne Xin Zhao , Sirui Wang , Fuzheng Zhang , Wei Wu , Ji-Rong Wen

Retrieval-augmented generation (RAG) systems traditionally employ sophisticated training strategies to enhance robustness against retrieval noise. In this work, we investigate a critical question: does the benefit of these complex robust…

Computation and Language · Computer Science 2025-10-06 Hanxing Ding , Shuchang Tao , Liang Pang , Zihao Wei , Liwei Chen , Kun Xu , Huawei Shen , Xueqi Cheng

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different…

Computation and Language · Computer Science 2023-12-21 Jiawei Chen , Hongyu Lin , Xianpei Han , Le Sun

Despite the rapid growth in model architecture, the scarcity of large parallel corpora remains the main bottleneck in Neural Machine Translation. Data augmentation is a technique that enhances the performance of data-hungry models by…

Computation and Language · Computer Science 2023-11-14 Seokjin Oh , Su Ah Lee , Woohwan Jung

Large language models (LLMs) have demonstrated strong capabilities in medical question answering; however, purely parametric models often suffer from knowledge gaps and limited factual grounding. Retrieval-augmented generation (RAG)…

Computation and Language · Computer Science 2026-04-09 Nusrat Sultana , Abdullah Muhammad Moosa , Kazi Afzalur Rahman , Sajal Chandra Banik

Large Language Models (LLMs) exhibit high reasoning capacity in medical question-answering, but their tendency to produce hallucinations and outdated knowledge poses critical risks in healthcare fields. While Retrieval-Augmented Generation…

Computation and Language · Computer Science 2026-03-25 Wenhao Wu , Zhentao Tang , Yafu Li , Shixiong Kai , Mingxuan Yuan , Chunlin Chen , Zhi Wang

Large Language Models (LLMs) exhibit remarkable capabilities but are prone to generating inaccurate or hallucinatory responses. This limitation stems from their reliance on vast pretraining datasets, making them susceptible to errors in…

Computation and Language · Computer Science 2024-04-02 Chi-Min Chan , Chunpu Xu , Ruibin Yuan , Hongyin Luo , Wei Xue , Yike Guo , Jie Fu

Retrieval-Augmented Generation (RAG) is an effective method to enhance the capabilities of large language models (LLMs). Existing methods typically optimize the retriever or the generator in a RAG system by directly using the top-k…

Computation and Language · Computer Science 2025-10-07 Shaohan Wang , Licheng Zhang , Zheren Fu , Zhendong Mao , Yongdong Zhang
‹ Prev 1 2 3 10 Next ›