English
Related papers

Related papers: TAGAL: Tabular Data Generation using Agentic LLM M…

200 papers

In the era of big data, access to abundant data is crucial for driving research forward. However, such data is often inaccessible due to privacy concerns or high costs, particularly in healthcare domain. Generating synthetic (tabular) data…

Machine Learning · Computer Science 2026-04-10 Yaobin Ling , Xiaoqian Jiang , Yejin Kim

Large language models (LLMs) are increasingly deployed as agents, expected to decompose goals, invoke tools, and verify results in dynamic environments. Realizing these capabilities requires access to agentic data-structured interaction…

Artificial Intelligence · Computer Science 2025-10-22 Abhigya Verma , Seganrasan Subramanian , Nandhakumar Kandasamy , Naman Gupta

While most generative models show achievements in image data generation, few are developed for tabular data generation. Recently, due to success of large language models (LLM) in diverse tasks, they have also been used for tabular data…

Machine Learning · Computer Science 2024-10-30 Dang Nguyen , Sunil Gupta , Kien Do , Thin Nguyen , Svetha Venkatesh

In this study, we explore the growing potential of AI and deep learning technologies, particularly Generative Adversarial Networks (GANs) and Large Language Models (LLMs), for generating synthetic tabular data. Access to quality students…

Machine Learning · Computer Science 2026-05-21 Mohammad Khalil , Sam Urmian , Ronas Shakya , Qinyi Liu

Tabular data is one of the most prevalent and important data formats in real-world applications such as healthcare, finance, and education. However, its effective use in machine learning is often constrained by data scarcity, privacy…

Machine Learning · Computer Science 2025-07-18 Ruxue Shi , Yili Wang , Mengnan Du , Xu Shen , Yi Chang , Xin Wang

Large Language Models (LLMs) offer a flexible means to generate synthetic tabular data, yet existing approaches often fail to preserve key causal parameters such as the average treatment effect (ATE). In this technical exploration, we first…

Machine Learning · Computer Science 2025-11-04 Dana Kim , Yichen Xu , Tiffany Lin

Synthetic data generation has emerged as an invaluable solution in scenarios where real-world data collection and usage are limited by cost and scarcity. Large language models (LLMs) have demonstrated remarkable capabilities in producing…

Machine Learning · Computer Science 2025-07-22 Anh Nguyen , Sam Schafft , Nicholas Hale , John Alfaro

The recent surge in research focused on generating synthetic data from large language models (LLMs), especially for scenarios with limited data availability, marks a notable shift in Generative Artificial Intelligence (AI). Their ability to…

Machine Learning · Computer Science 2024-03-08 Xu Guo , Yiqiang Chen

Large language models (LLMs) have shown promise in synthetic tabular data generation, yet existing methods struggle to preserve complex feature dependencies, particularly among categorical variables. This work introduces a…

Machine Learning · Computer Science 2025-05-07 Andrey Sidorenko

In the era of data-driven decision-making, accurate table-level representations and efficient table recommendation systems are becoming increasingly crucial for improving table management, discovery, and analysis. However, existing…

Machine Learning · Computer Science 2024-11-07 Dayu Yang , Natawut Monaikul , Amanda Ding , Bozhao Tan , Kishore Mosaliganti , Giri Iyengar

Generative models for tabular data have evolved rapidly beyond Generative Adversarial Networks (GANs). While GANs pioneered synthetic tabular data generation, recent advances in diffusion models and large language models (LLMs) have opened…

Machine Learning · Computer Science 2026-04-10 Insaf Ashrapov

While tabular data is fundamental to many real-world machine learning (ML) applications, acquiring high-quality tabular data is usually labor-intensive and expensive. Limited by the scarcity of observations, tabular datasets often exhibit…

Machine Learning · Computer Science 2026-02-05 Congjing Zhang , Ryan Feng Lin , Ruoxuan Bao , Shuai Huang

Analyzing textual data is the cornerstone of qualitative research. While traditional methods such as grounded theory and content analysis are widely used, they are labor-intensive and time-consuming. Topic modeling offers an automated…

Machine Learning · Computer Science 2025-03-19 Gerion Spielberger , Florian M. Artinger , Jochen Reb , Rudolf Kerschreiter

Access to large-scale high-quality healthcare databases is key to accelerate medical research and make insightful discoveries about diseases. However, access to such data is often limited by patient privacy concerns, data sharing…

Despite recent advances in synthetic data generation, the scientific community still lacks a unified consensus on its usefulness. It is commonly believed that synthetic data can be used for both data exchange and boosting machine learning…

Machine Learning · Computer Science 2023-06-28 Dionysis Manousakas , Sergül Aydöre

Efficient processing of tabular data is important in various industries, especially when working with datasets containing a large number of columns. Large language models (LLMs) have demonstrated their ability on several tasks through…

Machine Learning · Computer Science 2024-08-22 Ashlesha Akella , Abhijit Manatkar , Brij Chavda , Hima Patel

Large Language Models (LLMs) increasingly rely on agentic capabilities-iterative retrieval, tool use, and decision-making-to overcome the limits of static, parametric knowledge. Yet existing agentic frameworks treat external information as…

Computation and Language · Computer Science 2026-04-24 Yuanfu Sun , Kang Li , Dongzhe Fan , Jiajin Liu , Qiaoyu Tan

The collection and curation of high-quality training data is crucial for developing text classification models with superior performance, but it is often associated with significant costs and time investment. Researchers have recently…

Computation and Language · Computer Science 2023-10-16 Zhuoyan Li , Hangxiao Zhu , Zhuoran Lu , Ming Yin

The creation of high-quality datasets to improve Large Language Model (LLM) reasoning remains a significant challenge, as current methods often suffer from generating low-quality/incorrect answers and limited information richness from…

Computation and Language · Computer Science 2026-01-09 Xianyang Liu , Yilin Liu , Shuai Wang , Hao Cheng , Andrew Estornell , Yuzhi Zhao , Jun Shu , Jiaheng Wei

Large Language Models (LLMs) have advanced artificial intelligence by enabling human-like text generation and natural language understanding. However, their reliance on static training data limits their ability to respond to dynamic,…

Artificial Intelligence · Computer Science 2026-04-02 Aditi Singh , Abul Ehtesham , Saket Kumar , Tala Talaei Khoei , Athanasios V. Vasilakos
‹ Prev 1 2 3 10 Next ›