English
Related papers

Related papers: Data Context Informed Data Wrangling

200 papers

Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and…

Ensuring data quality in large tabular datasets is a critical challenge, typically addressed through data wrangling tasks. Traditional statistical methods, though efficient, cannot often understand the semantic context and deep learning…

Machine Learning · Computer Science 2025-02-25 Ashlesha Akella , Krishnasuri Narayanam

Data wrangling continues to be the most time-consuming task in the data science pipeline and wireless network data is no exception. Prior approaches for automatic or assisted data-wrangling primarily target unordered, single-table data.…

Databases · Computer Science 2026-03-31 Anirudh Kamath , Dustin Maas , Jacobus Van der Merwe , Anna Fariha

CoWrangler is a data-wrangling recommender system designed to streamline data processing tasks. Recognizing that data processing is often time-consuming and complex for novice users, we aim to simplify the decision-making process regarding…

Databases · Computer Science 2024-09-18 Yuqing Wang , Anna Fariha

The AI revolution is data driven. AI "data wrangling" is the process by which unusable data is transformed to support AI algorithm development (training) and deployment (inference). Significant time is devoted to translating diverse data…

Databases · Computer Science 2020-01-22 Jeremy Kepner , Vijay Gadepally , Hayden Jananthan , Lauren Milechin , Siddharth Samsi

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

As the volume of publicly available data continues to grow, researchers face the challenge of limited diversity in benchmarking machine learning tasks. Although thousands of datasets are available in public repositories, the sheer abundance…

Information Retrieval · Computer Science 2025-02-25 Mara Graziani , Malina Molnar , Irina Espejo Morales , Joris Cadow-Gossweiler , Teodoro Laino

Data wrangling is a time-consuming and challenging task in a data science pipeline. While many tools have been proposed to automate or facilitate data wrangling, they often misinterpret user intent, especially in complex tasks. We propose…

Human-Computer Interaction · Computer Science 2025-03-07 Wei-Hao Chen , Weixi Tong , Amanda Case , Tianyi Zhang

Data preparation, also called data wrangling, is considered one of the most expensive and time-consuming steps when performing analytics or building machine learning models. Preparing data typically involves collecting and merging data from…

Computation and Language · Computer Science 2023-06-22 Michael Glass , Xueqing Wu , Ankita Rajaram Naik , Gaetano Rossiello , Alfio Gliozzo

Research in data warehousing and OLAP has produced important technologies for the design, management and use of information systems for decision support. With the development of Internet, the availability of various types of data has…

Data integration is considered a classic research field and a pressing need within the information science community. Ontologies play a critical role in such a process by providing well-consolidated support to link and semantically…

Artificial Intelligence · Computer Science 2024-05-30 Inès Osman , Salvatore F. Pileggi , Sadok Ben Yahia

In the field of machine learning, data understanding is the practice of getting initial insights in unknown datasets. Such knowledge-intensive tasks require a lot of documentation, which is necessary for data scientists to grasp the meaning…

Databases · Computer Science 2018-06-14 Markus Schröder , Christian Jilek , Jörn Hees , Andreas Dengel

As data continues to grow in scale and complexity, preparing, transforming, and analyzing it remains labor-intensive, repetitive, and difficult to scale. Since data contains knowledge and AI learns knowledge from it, the alignment between…

Artificial Intelligence · Computer Science 2025-10-07 Yanjie Fu , Dongjie Wang , Wangyang Ying , Xinyuan Wang , Xiangliang Zhang , Huan Liu , Jian Pei

This document concerns data readiness in the context of machine learning and Natural Language Processing. It describes how an organization may proceed to identify, make available, validate, and prepare data to facilitate automated analysis…

Computers and Society · Computer Science 2020-10-01 Fredrik Olsson , Magnus Sahlgren

A growing number of researchers suggest that software process must be tailored to a project's context to achieve maximal performance. Researchers have studied 'context' in an ad-hoc way, with focus on those contextual factors that appear to…

Software Engineering · Computer Science 2021-02-19 Diana Kirk , Stephen G. MacDonell

Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. While a vast body of work is dedicated to interpreting machine learning models in the…

Machine Learning · Computer Science 2023-07-21 Alexandre Forel , Axel Parmentier , Thibaut Vidal

Given the complexity of typical data science projects and the associated demand for human expertise, automation has the potential to transform the data science process. Key insights: * Automation in data science aims to facilitate and…

We envisage future context-aware applications will dynamically adapt their behaviors to various context data from sources in wide-area networks, such as the Internet. Facing the changing context and the sheer number of context sources, a…

Databases · Computer Science 2020-03-10 Wenwei Xue , Hungkeng Pung , Wenlong Ng , Tao Gu

For delivering products or services to their clients, organizations execute manifold business processes. During such execution, upcoming process tasks need to be allocated to internal resources. Resource allocation is a complex…

Software Engineering · Computer Science 2024-03-29 Luise Pufahl , Sven Ihde , Fabian Stiehle , Mathias Weske , Ingo Weber

Modern approach to artificial intelligence (AI) aims to design algorithms that learn directly from data. This approach has achieved impressive results and has contributed significantly to the progress of AI, particularly in the sphere of…

Machine Learning · Computer Science 2024-03-20 Alhassan Mumuni , Fuseini Mumuni
‹ Prev 1 2 3 10 Next ›