English
Related papers

Related papers: Data Readiness Levels

200 papers

This document concerns data readiness in the context of machine learning and Natural Language Processing. It describes how an organization may proceed to identify, make available, validate, and prepare data to facilitate automated analysis…

Computers and Society · Computer Science 2020-10-01 Fredrik Olsson , Magnus Sahlgren

Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current practices of data cleaning and data readiness assessment for machine learning tasks are mostly conducted in an arbitrary manner which…

Databases · Computer Science 2020-10-16 Shazia Afzal , Rajmohan C , Manish Kesarwani , Sameep Mehta , Hima Patel

AI application developers typically begin with a dataset of interest and a vision of the end analytic or insight they wish to gain from the data at hand. Although these are two very important components of an AI workflow, one often spends…

Databases · Computer Science 2021-03-04 El Kindi Rezig , Michael Cafarella , Vijay Gadepally

Data preparation is a critical step in enhancing the usability of tabular data and thus boosts downstream data-driven tasks. Traditional methods often face challenges in capturing the intricate relationships within tables and adapting to…

Artificial Intelligence · Computer Science 2025-08-05 Mengshi Chen , Yuxiang Sun , Tengchao Li , Jianwei Wang , Kai Wang , Xuemin Lin , Ying Zhang , Wenjie Zhang

Data preparation, especially data cleaning, is very important to ensure data quality and to improve the output of automated decision systems. Since there is no single tool that covers all steps required, a combination of tools -- namely a…

Databases · Computer Science 2023-08-29 Valerie Restat

Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even…

We present experiences and lessons learned from increasing data readiness of heterogeneous data for artificial intelligence projects using visual analysis methods. Increasing the data readiness level involves understanding both the data as…

Methodology · Statistics 2024-09-09 Mattias Tiger , Daniel Jakobsson , Anders Ynnerman , Fredrik Heintz , Daniel Jönsson

Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice…

Human-Computer Interaction · Computer Science 2022-08-25 Amy K. Heger , Liz B. Marquis , Mihaela Vorvoreanu , Hanna Wallach , Jennifer Wortman Vaughan

Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data…

Computation and Language · Computer Science 2025-11-14 Yuqi Zhu , Yi Zhong , Jintian Zhang , Ziheng Zhang , Shuofei Qiao , Yujie Luo , Lun Du , Da Zheng , Ningyu Zhang , Huajun Chen

Large language models (LLMs) have demonstrated remarkable performance across a wide range of tasks and domains, with data playing a central role in enabling these advances. Despite this success, the preparation and effective utilization of…

Computation and Language · Computer Science 2026-03-17 Hao Liang , Zhengyang Zhao , Zhaoyang Han , Meiyi Qiang , Xiaochen Ma , Bohan Zeng , Qifeng Cai , Zhiyu Li , Linpeng Tang , Weinan E , Wentao Zhang

Data science has employed great research efforts in developing advanced analytics, improving data models and cultivating new algorithms. However, not many authors have come across the organizational and socio-technical challenges that arise…

Machine Learning · Computer Science 2022-01-17 Iñigo Martinez , Elisabeth Viles , Igor G. Olaizola

Technology Readiness Levels are a mainstay for organizations that fund, develop, test, acquire, or use technologies. Technology Readiness Levels provide a standardized assessment of a technology's maturity and enable consistent comparison…

The quality of the data in a dataset can have a substantial impact on the performance of a machine learning model that is trained and/or evaluated using the dataset. Effective dataset management, including tasks such as data cleanup,…

Databases · Computer Science 2023-03-16 Ze Mao , Yang Xu , Erick Suarez

Data quality describes the degree to which data meet specific requirements and are fit for use by humans and/or downstream tasks (e.g., artificial intelligence). Data quality can be assessed across multiple high-level concepts called…

Databases · Computer Science 2025-07-24 Vasileios Papastergios , Lisa Ehrlinger , Anastasios Gounaris

Large language models have recently demonstrated their exceptional capabilities in supporting and automating various tasks. Among the tasks worth exploring for testing large language model capabilities, we considered data preparation, a…

Computation and Language · Computer Science 2025-12-01 Matteo Spreafico , Ludovica Tassini , Camilla Sancricca , Cinzia Cappiello

We introduce the idea of Data Readiness Level (DRL) to measure the relative richness of data to answer specific questions often encountered by data scientists. We first approach the problem in its full generality explaining its desired…

Information Retrieval · Computer Science 2017-02-08 Hui Guan , Thanos Gentimis , Hamid Krim , James Keiser

The development and deployment of machine learning (ML) systems can be executed easily with modern tools, but the process is typically rushed and means-to-an-end. The lack of diligence can lead to technical debt, scope creep and misaligned…

The introduction of machine learning (ML) components in software projects has created the need for software engineers to collaborate with data scientists and other specialists. While collaboration can always be challenging, ML introduces…

Software Engineering · Computer Science 2022-02-14 Nadia Nahar , Shurui Zhou , Grace Lewis , Christian Kästner

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

Data engineering pipelines are a widespread way to provide high-quality data for all kinds of data science applications. However, numerous challenges still remain in the composition and operation of such pipelines. Data engineering…

Databases · Computer Science 2025-07-30 Kevin M. Kramer , Valerie Restat , Sebastian Strasser , Uta Störl , Meike Klettke
‹ Prev 1 2 3 10 Next ›