Related papers: TAGAL: Tabular Data Generation using Agentic LLM M…

MALLM-GAN: Multi-Agent Large Language Model as Generative Adversarial Network for Synthesizing Tabular Data

In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data…

Machine Learning · Computer Science 2026-04-10 Yaobin Ling , Xiaoqian Jiang , Yejin Kim

FABRIC: Framework for Agent-Based Realistic Intelligence Creation

Large language models (LLMs) are increasingly deployed as agents, expected to decompose goals, invoke tools, and verify results in dynamic environments. Realizing these capabilities requires access to agentic data-structured interaction…

Artificial Intelligence · Computer Science 2025-10-22 Abhigya Verma , Seganrasan Subramanian , Nandhakumar Kandasamy , Naman Gupta

Generating Realistic Tabular Data with Large Language Models

While most generative models show achievements in image data generation, few are developed for tabular data generation. Recently, due to success of large language models (LLM) in diverse tasks, they have also been used for tabular data…

Machine Learning · Computer Science 2024-10-30 Dang Nguyen , Sunil Gupta , Kien Do , Thin Nguyen , Svetha Venkatesh

Creating Artificial Students that Never Existed: Leveraging Large Language Models and CTGANs for Synthetic Data Generation

In this study, we explore the growing potential of AI and deep learning technologies, particularly Generative Adversarial Networks (GANs) and Large Language Models (LLMs), for generating synthetic tabular data. Access to quality students…

Machine Learning · Computer Science 2026-05-21 Mohammad Khalil , Sam Urmian , Ronas Shakya , Qinyi Liu

A Comprehensive Survey of Synthetic Tabular Data Generation

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang

A Technical Exploration of Causal Inference with Hybrid LLM Synthetic Data

Large Language Models (LLMs) offer a flexible means to generate synthetic tabular data, yet existing approaches often fail to preserve key causal parameters such as the average treatment effect (ATE). In this technical exploration, we first…

Machine Learning · Computer Science 2025-11-04 Dana Kim , Yichen Xu , Tiffany Lin

FASTGEN: Fast and Cost-Effective Synthetic Tabular Data Generation with LLMs

Synthetic data generation has emerged as an invaluable solution in scenarios where real-world data collection and usage are limited by cost and scarcity. Large language models (LLMs) have demonstrated remarkable capabilities in producing…

Machine Learning · Computer Science 2025-07-22 Anh Nguyen , Sam Schafft , Nicholas Hale , John Alfaro

Generative AI for Synthetic Data Generation: Methods, Challenges and the Future

The recent surge in research focused on generating synthetic data from large language models (LLMs), especially for scenarios with limited data availability, marks a notable shift in Generative Artificial Intelligence (AI). Their ability to…

Machine Learning · Computer Science 2024-03-08 Xu Guo , Yiqiang Chen

A Note on Statistically Accurate Tabular Data Generation Using Large Language Models

Large language models (LLMs) have shown promise in synthetic tabular data generation, yet existing methods struggle to preserve complex feature dependencies, particularly among categorical variables. This work introduces a…

Machine Learning · Computer Science 2025-05-07 Andrey Sidorenko

Enhancing Table Representations with LLM-powered Synthetic Data Generation

In the era of data-driven decision-making, accurate table-level representations and efficient table recommendation systems are becoming increasingly crucial for improving table management, discovery, and analysis. However, existing…

Machine Learning · Computer Science 2024-11-07 Dayu Yang , Natawut Monaikul , Amanda Ding , Bozhao Tan , Kishore Mosaliganti , Giri Iyengar

Tabular GANs for uneven distribution

Generative models for tabular data have evolved rapidly beyond Generative Adversarial Networks (GANs). While GANs pioneered synthetic tabular data generation, recent advances in diffusion models and large language models (LLMs) have opened…

Machine Learning · Computer Science 2026-04-10 Insaf Ashrapov

Team, Then Trim: An Assembly-Line LLM Framework for High-Quality Tabular Data Generation

While tabular data is fundamental to many real-world machine learning (ML) applications, acquiring high-quality tabular data is usually labor-intensive and expensive. Limited by the scarcity of observations, tabular datasets often exhibit…

Machine Learning · Computer Science 2026-02-05 Congjing Zhang , Ryan Feng Lin , Ruoxuan Bao , Shuai Huang

Retrieval Augmented Generation for Topic Modeling in Organizational Research: An Introduction with Empirical Demonstration

Analyzing textual data is the cornerstone of qualitative research. While traditional methods such as grounded theory and content analysis are widely used, they are labor-intensive and time-consuming. Topic modeling offers an automated…

Machine Learning · Computer Science 2025-03-19 Gerion Spielberger , Florian M. Artinger , Jochen Reb , Rudolf Kerschreiter

A text-to-tabular approach to generate synthetic patient data using LLMs

Access to large-scale high-quality healthcare databases is key to accelerate medical research and make insightful discoveries about diseases. However, access to such data is often limited by patient privacy concerns, data sharing…

Machine Learning · Computer Science 2025-06-25 Margaux Tornqvist , Jean-Daniel Zucker , Tristan Fauvel , Nicolas Lambert , Mathilde Berthelot , Antoine Movschin

On the Usefulness of Synthetic Tabular Data Generation

Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning…

Machine Learning · Computer Science 2023-06-28 Dionysis Manousakas , Sergül Aydöre

An Automatic Prompt Generation System for Tabular Data Tasks

Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through…

Machine Learning · Computer Science 2024-08-22 Ashlesha Akella , Abhijit Manatkar , Brij Chavda , Hima Patel

AgentGL: Towards Agentic Graph Learning with LLMs via Reinforcement Learning

Large Language Models (LLMs) increasingly rely on agentic capabilities-iterative retrieval, tool use, and decision-making-to overcome the limits of static, parametric knowledge. Yet existing agentic frameworks treat external information as…

Computation and Language · Computer Science 2026-04-24 Yuanfu Sun , Kang Li , Dongzhe Fan , Jiajin Liu , Qiaoyu Tan

Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations

The collection and curation of high-quality training data is crucial for developing text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently…

Computation and Language · Computer Science 2023-10-16 Zhuoyan Li , Hangxiao Zhu , Zhuoran Lu , Ming Yin

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

The creation of high-quality datasets to improve Large Language Model (LLM) reasoning remains a significant challenge, as current methods often suffer from generating low-quality/incorrect answers and limited information richness from…

Computation and Language · Computer Science 2026-01-09 Xianyang Liu , Yilin Liu , Shuai Wang , Hao Cheng , Andrew Estornell , Yuzhi Zhao , Jun Shu , Jiaheng Wei

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG

Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic,…

Artificial Intelligence · Computer Science 2026-04-02 Aditi Singh , Abul Ehtesham , Saket Kumar , Tala Talaei Khoei , Athanasios V. Vasilakos