English
Related papers

Related papers: Data Wrangling Task Automation Using Code-Generati…

200 papers

Large language models have recently demonstrated their exceptional capabilities in supporting and automating various tasks. Among the tasks worth exploring for testing large language model capabilities, we considered data preparation, a…

Computation and Language · Computer Science 2025-12-01 Matteo Spreafico , Ludovica Tassini , Camilla Sancricca , Cinzia Cappiello

Reliable data quality is crucial for downstream analysis of tabular datasets, yet rule-based validation often struggles with inefficiency, human intervention, and high computational costs. We present a three-stage framework that combines…

Software Engineering · Computer Science 2025-09-23 Ashlesha Akella , Akshar Kaul , Krishnasuri Narayanam , Sameep Mehta

The process of preparing potentially large and complex data sets for further analysis or manual examination is often called data wrangling. In classical warehousing environments, the steps in such a process have been carried out using…

A common training approach for language models involves using a large-scale language model to expand a human-provided dataset, which is subsequently used for model training.This method significantly reduces training costs by eliminating the…

Computation and Language · Computer Science 2025-07-09 Minghang Zhu , Shen Gao , Zhengliang Shi , Jiabao Fang , Pengjie Ren , Zhaochun Ren , Zhumin Chen , Shuo Shang

High-quality, error-free datasets are a key ingredient in building reliable, accurate, and unbiased machine learning (ML) models. However, real world datasets often suffer from errors due to sensor malfunctions, data entry mistakes, or…

Machine Learning · Computer Science 2025-03-11 Tommaso Bendinelli , Artur Dox , Christian Holz

Real-world text classification tasks often require many labeled training examples that are expensive to obtain. Recent advancements in machine teaching, specifically the data programming paradigm, facilitate the creation of training data…

Machine Learning · Computer Science 2020-02-05 Neil Mallinar , Abhishek Shah , Tin Kam Ho , Rajendra Ugrani , Ayush Gupta

Large Language Models offer new opportunities to devise automated implementation generation methods that can tackle problem solving activities beyond traditional methods, which require algorithmic specifications and can use only static…

Computation and Language · Computer Science 2025-01-06 Hashmath Shaik , Alex Doboli

Automated insight generation is a common tactic for helping knowledge workers, such as data scientists, to quickly understand the potential value of new and unfamiliar data. Unfortunately, automated insights produced by large-language…

Software Engineering · Computer Science 2024-05-06 Ananya Singha , Bhavya Chopra , Anirudh Khatry , Sumit Gulwani , Austin Z. Henley , Vu Le , Chris Parnin , Mukul Singh , Gust Verbruggen

Recent advances in neural network-based generative modeling have reignited the hopes in having computer systems capable of seamlessly conversing with humans and able to understand natural language. Neural architectures have been employed to…

Computation and Language · Computer Science 2020-08-03 Cristina Garbacea , Qiaozhu Mei

In the domain of data science, the predictive tasks of classification, regression, and imputation of missing values are commonly encountered challenges associated with tabular data. This research endeavors to apply Large Language Models…

Machine Learning · Computer Science 2026-04-23 Yazheng Yang , Yuqi Wang , Yaxuan Li , Sankalok Sen , Lei Li , Lin Qiu , Qi Liu

We consider the task of data-to-text generation, which aims to create textual output from non-linguistic input. We focus on generating long-form text, i.e., documents with multiple paragraphs, and propose a neural model enhanced with a…

Computation and Language · Computer Science 2022-03-01 Ratish Puduppully , Yao Fu , Mirella Lapata

To efficiently select optimal dataset combinations for enhancing multi-task learning (MTL) performance in large language models, we proposed a novel framework that leverages a neural network to predict the best dataset combinations. The…

Computation and Language · Computer Science 2025-05-06 Zaifu Zhan , Rui Zhang

A major factor in the recent success of large language models is the use of enormous and ever-growing text datasets for unsupervised pre-training. However, naively training a model on all available data may not be optimal (or feasible), as…

Recent breakthroughs in large language modeling have facilitated rigorous exploration of their application in diverse tasks related to tabular data modeling, such as prediction, tabular data synthesis, question answering, and table…

Automated planning is concerned with developing efficient algorithms to generate plans or sequences of actions to achieve a specific goal in a given environment. Emerging Large Language Models (LLMs) can answer questions, write high-quality…

This paper explores the potential of large language models (LLMs) for task automation in the provision of technical services in the production machinery sector. By focusing on text correction, summarization, and question answering, the…

General Economics · Economics 2025-05-19 Jochen Wulf , Juerg Meierhofer

The exponential growth of data generated on the Internet in the current information age is a driving force for the digital economy. Extraction of information is the major value in an accumulated big data. Big data dependency on statistical…

Large Language Models (LLMs) have shown remarkable performance in various natural language processing tasks but face challenges in mathematical reasoning, where complex problem-solving requires both linguistic understanding and mathematical…

Computation and Language · Computer Science 2025-03-20 Shuguang Chen , Guang Lin

Advances in natural language processing, such as transfer learning from pre-trained language models, have impacted how models are trained for programming language tasks too. Previous research primarily explored code pre-training and…

Computation and Language · Computer Science 2023-02-08 Pinzhen Chen , Gerasimos Lampouras

Large language models (LLMs) hold promise for generating plans for complex tasks, but their effectiveness is limited by sequential execution, lack of control flow models, and difficulties in skill retrieval. Addressing these issues is…

Computation and Language · Computer Science 2024-10-18 Andrei Cosmin Redis , Mohammadreza Fani Sani , Bahram Zarrin , Andrea Burattin
‹ Prev 1 2 3 10 Next ›