English
Related papers

Related papers: Data-Prep-Kit: getting your data ready for LLM app…

200 papers

Large language models (LLMs) have been widely applied in various practical applications, typically comprising billions of parameters, with inference processes requiring substantial energy and computational resources. In contrast, the human…

Software Engineering · Computer Science 2024-12-23 Xin Du , Shifan Ye , Qian Zheng , Yangfan Hu , Rui Yan , Shunyu Qi , Shuyang Chen , Huajin Tang , Gang Pan , Shuiguang Deng

Data preparation is a central and time-consuming stage in data analysis workflows. Traditionally, commercial tools have relied on graphical user interfaces (GUIs) to simplify data preparation, allowing users to define transformations…

Databases · Computer Science 2026-05-12 Jingzhe Xu , Rui Wang , Jiannan Wang , Guoliang Li

Data preparation, which aims to transform heterogeneous and noisy raw tables into analysis-ready data, remains a major bottleneck in data science. Recent approaches leverage large language models (LLMs) to automate data preparation from…

Databases · Computer Science 2026-02-10 Meihao Fan , Ju Fan , Yuxin Zhang , Shaolei Zhang , Xiaoyong Du , Jie Song , Peng Li , Fuxin Jiang , Tieying Zhang , Jianjun Chen

Large Language Models for Code (or code LLMs) are increasingly gaining popularity and capabilities, offering a wide array of functionalities such as code completion, code generation, code summarization, test generation, code translation,…

Software Engineering · Computer Science 2024-10-18 Rahul Krishna , Rangeet Pan , Raju Pavuluri , Srikanth Tamilselvam , Maja Vukovic , Saurabh Sinha

Large Language Models (LLMs), typified by OpenAI's GPT, have marked a significant advancement in artificial intelligence. Trained on vast amounts of text data, LLMs are capable of understanding and generating human-like text across a…

Artificial Intelligence · Computer Science 2024-10-29 Haochen Zhang , Yuyang Dong , Chuan Xiao , Masafumi Oyamada

Data preparation is a critical step in enhancing the usability of tabular data and thus boosts downstream data-driven tasks. Traditional methods often face challenges in capturing the intricate relationships within tables and adapting to…

Artificial Intelligence · Computer Science 2025-08-05 Mengshi Chen , Yuxiang Sun , Tengchao Li , Jianwei Wang , Kai Wang , Xuemin Lin , Ying Zhang , Wenjie Zhang

The rapid advancement of Large Language Models (LLMs) has revolutionized various sectors by automating routine tasks, marking a step toward the realization of Artificial General Intelligence (AGI). However, they still struggle to…

Machine Learning · Computer Science 2024-02-21 Zihao Tang , Zheqi Lv , Shengyu Zhang , Fei Wu , Kun Kuang

Language is essentially a complex, intricate system of human expressions governed by grammatical rules. It poses a significant challenge to develop capable AI algorithms for comprehending and grasping a language. As a major approach,…

This paper explores the utilization of LLMs for data preprocessing (DP), a crucial step in the data mining pipeline that transforms raw data into a clean format conducive to easy processing. Whereas the use of LLMs has sparked interest in…

Artificial Intelligence · Computer Science 2024-10-30 Haochen Zhang , Yuyang Dong , Chuan Xiao , Masafumi Oyamada

Large Language Models (LLMs) have shown remarkable proficiency in natural language understanding (NLU), opening doors for innovative applications. We introduce StreamLink - an LLM-driven distributed data system designed to improve the…

Databases · Computer Science 2025-05-29 Dawei Feng , Di Mei , Huiri Tan , Lei Ren , Xianying Lou , Zhangxi Tan

Despite the advanced intelligence abilities of large language models (LLMs) in various applications, they still face significant computational and storage demands. Knowledge Distillation (KD) has emerged as an effective strategy to improve…

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks and domains, with data playing a central role in enabling these advances. Despite this success, the preparation and effective utilization of…

Computation and Language · Computer Science 2026-03-17 Hao Liang , Zhengyang Zhao , Zhaoyang Han , Meiyi Qiang , Xiaochen Ma , Bohan Zeng , Qifeng Cai , Zhiyu Li , Linpeng Tang , Weinan E , Wentao Zhang

This paper presents a comprehensive overview of the data preparation pipeline developed for the OpenGPT-X project, a large-scale initiative aimed at creating open and high-performance multilingual large language models (LLMs). The project…

We present OnPrem$.$LLM, a Python-based toolkit for applying large language models (LLMs) to sensitive, non-public data in offline or restricted environments. The system is designed for privacy-preserving use cases and provides prebuilt…

Computation and Language · Computer Science 2025-09-30 Arun S. Maiya

Designing effective data manipulation methods is a long standing problem in data lakes. Traditional methods, which rely on rules or machine learning models, require extensive human efforts on training data collection and tuning models.…

Artificial Intelligence · Computer Science 2024-05-13 Yichen Qian , Yongyi He , Rong Zhu , Jintao Huang , Zhijian Ma , Haibin Wang , Yaohua Wang , Xiuyu Sun , Defu Lian , Bolin Ding , Jingren Zhou

Large Language Models (LLMs) are gaining popularity for hardware design automation, particularly through Register Transfer Level (RTL) code generation. In this work, we examine the current literature on RTL generation using LLMs and…

Hardware Architecture · Computer Science 2025-07-21 Paul E. Calzada , Zahin Ibnat , Tanvir Rahman , Kamal Kandula , Danyu Lu , Sujan Kumar Saha , Farimah Farahmandi , Mark Tehranipoor

Data preparation aims to denoise raw datasets, uncover cross-dataset relationships, and extract valuable insights from them, which is essential for a wide range of data-centric applications. Driven by (i) rising demands for…

The rapidly growing demand for high-quality data in Large Language Models (LLMs) has intensified the need for scalable, reliable, and semantically rich data preparation pipelines. However, current practices remain dominated by ad-hoc…

Creating high-quality, large-scale datasets for large language models (LLMs) often relies on resource-intensive, GPU-accelerated models for quality filtering, making the process time-consuming and costly. This dependence on GPUs limits…

Computation and Language · Computer Science 2024-11-19 Yungi Kim , Hyunsoo Ha , Seonghoon Yang , Sukyung Lee , Jihoo Kim , Chanjun Park

This research investigates the application of Large Language Models (LLMs) to augment conversational agents in process mining, aiming to tackle its inherent complexity and diverse skill requirements. While LLM advancements present novel…

Artificial Intelligence · Computer Science 2023-07-20 Urszula Jessen , Michal Sroka , Dirk Fahland
‹ Prev 1 2 3 10 Next ›