Related papers: Data Context Informed Data Wrangling

AI Assistants: A Framework for Semi-Automated Data Wrangling

Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and…

Databases · Computer Science 2022-11-02 Tomas Petricek , Gerrit J. J. van den Burg , Alfredo Nazábal , Taha Ceritli , Ernesto Jiménez-Ruiz , Christopher K. I. Williams

Data Wrangling Task Automation Using Code-Generating Language Models

Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning…

Machine Learning · Computer Science 2025-02-25 Ashlesha Akella , Krishnasuri Narayanam

WN-Wrangle: Wireless Network Data Wrangling Assistant

Data wrangling continues to be the most time-consuming task in the data science pipeline and wireless network data is no exception. Prior approaches for automatic or assisted data-wrangling primarily target unordered, single-table data.…

Databases · Computer Science 2026-03-31 Anirudh Kamath , Dustin Maas , Jacobus Van der Merwe , Anna Fariha

Development of Data Evaluation Benchmark for Data Wrangling Recommendation System

CoWrangler is a data-wrangling recommender system designed to streamline data processing tasks. Recognizing that data processing is often time-consuming and complex for novice users, we aim to simplify the decision-making process regarding…

Databases · Computer Science 2024-09-18 Yuqing Wang , Anna Fariha

AI Data Wrangling with Associative Arrays

The AI revolution is data driven. AI "data wrangling" is the process by which unusable data is transformed to support AI algorithm development (training) and deployment (inference). Significant time is devoted to translating diverse data…

Databases · Computer Science 2020-01-22 Jeremy Kepner , Vijay Gadepally , Hayden Jananthan , Lauren Milechin , Siddharth Samsi

A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

Making Sense of Data in the Wild: Data Analysis Automation at Scale

As the volume of publicly available data continues to grow, researchers face the challenge of limited diversity in benchmarking machine learning tasks. Although thousands of datasets are available in public repositories, the sheer abundance…

Information Retrieval · Computer Science 2025-02-25 Mara Graziani , Malina Molnar , Irina Espejo Morales , Joris Cadow-Gossweiler , Teodoro Laino

Dango: A Mixed-Initiative Data Wrangling System using Large Language Model

Data wrangling is a time-consuming and challenging task in a data science pipeline. While many tools have been proposed to automate or facilitate data wrangling, they often misinterpret user intent, especially in complex tasks. We propose…

Human-Computer Interaction · Computer Science 2025-03-07 Wei-Hao Chen , Weixi Tong , Amanda Case , Tianyi Zhang

Retrieval-Based Transformer for Table Augmentation

Data preparation, also called data wrangling, is considered one of the most expensive and time-consuming steps when performing analytics or building machine learning models. Preparing data typically involves collecting and merging data from…

Computation and Language · Computer Science 2023-06-22 Michael Glass , Xueqing Wu , Ankita Rajaram Naik , Gaetano Rossiello , Alfio Gliozzo

Innovative Approaches for efficiently Warehousing Complex Data from the Web

Research in data warehousing and OLAP has produced important technologies for the design, management and use of information systems for decision support. With the development of Internet, the availability of various types of data has…

Databases · Computer Science 2017-01-31 Fadila Bentayeb , Nora Maïz , Hadj Mahboubi , Cécile Favre , Sabine Loudcher , Nouria Harbi , Omar Boussaïd , Jérôme Darmont

Uncertainty in Automated Ontology Matching: Lessons Learned from an Empirical Experimentation

Data integration is considered a classic research field and a pressing need within the information science community. Ontologies play a critical role in such a process by providing well-consolidated support to link and semantically…

Artificial Intelligence · Computer Science 2024-05-30 Inès Osman , Salvatore F. Pileggi , Sadok Ben Yahia

Towards Semantically Enhanced Data Understanding

In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning…

Databases · Computer Science 2018-06-14 Markus Schröder , Christian Jilek , Jörn Hees , Andreas Dengel

Autonomous Data Agents: A New Opportunity for Smart Data

As data continues to grow in scale and complexity, preparing, transforming, and analyzing it remains labor-intensive, repetitive, and difficult to scale. Since data contains knowledge and AI learns knowledge from it, the alignment between…

Artificial Intelligence · Computer Science 2025-10-07 Yanjie Fu , Dongjie Wang , Wangyang Ying , Xinyuan Wang , Xiangliang Zhang , Huan Liu , Jian Pei

Data Readiness for Natural Language Processing

This document concerns data readiness in the context of machine learning and Natural Language Processing. It describes how an organization may proceed to identify, make available, validate, and prepare data to facilitate automated analysis…

Computers and Society · Computer Science 2020-10-01 Fredrik Olsson , Magnus Sahlgren

Categorising Software Contexts: Research-in-Progress

A growing number of researchers suggest that software process must be tailored to a project's context to achieve maximal performance. Researchers have studied 'context' in an ad-hoc way, with focus on those contextual factors that appear to…

Software Engineering · Computer Science 2021-02-19 Diana Kirk , Stephen G. MacDonell

Explainable Data-Driven Optimization: From Context to Decision and Back Again

Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. While a vast body of work is dedicated to interpreting machine learning models in the…

Machine Learning · Computer Science 2023-07-21 Alexandre Forel , Axel Parmentier , Thibaut Vidal

Automating Data Science: Prospects and Challenges

Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process. Key insights: * Automation in data science aims to facilitate and…

Databases · Computer Science 2022-03-01 Tijl De Bie , Luc De Raedt , José Hernández-Orallo , Holger H. Hoos , Padhraic Smyth , Christopher K. I. Williams

Data Management for Context-Aware Computing

We envisage future context-aware applications will dynamically adapt their behaviors to various context data from sources in wide-area networks, such as the Internet. Facing the changing context and the sheer number of context sources, a…

Databases · Computer Science 2020-03-10 Wenwei Xue , Hungkeng Pung , Wenlong Ng , Tao Gu

Automatic Resource Allocation in Business Processes: A Systematic Literature Survey

For delivering products or services to their clients, organizations execute manifold business processes. During such execution, upcoming process tasks need to be allocated to internal resources. Resource allocation is a complex…

Software Engineering · Computer Science 2024-03-29 Luise Pufahl , Sven Ihde , Fabian Stiehle , Mathias Weske , Ingo Weber

Automated data processing and feature engineering for deep learning and big data applications: a survey

Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results and has contributed significantly to the progress of AI, particularly in the sphere of…

Machine Learning · Computer Science 2024-03-20 Alhassan Mumuni , Fuseini Mumuni