Related papers: Data Readiness Report

Data Cards: Purposeful and Transparent Dataset Documentation for Responsible AI

As research and industry moves towards large-scale models capable of numerous downstream tasks, the complexity of understanding multi-modal datasets that give nuance to models rapidly increases. A clear and thorough understanding of a…

Human-Computer Interaction · Computer Science 2022-04-05 Mahima Pushkarna , Andrew Zaldivar , Oddur Kjartansson

The Dataset Nutrition Label: A Framework To Drive Higher Data Quality Standards

Artificial intelligence (AI) systems built on incomplete or biased data will often exhibit problematic outcomes. Current methods of data analysis, particularly before model development, are costly and not standardized. The Dataset Nutrition…

Databases · Computer Science 2018-05-11 Sarah Holland , Ahmed Hosny , Sarah Newman , Joshua Joseph , Kasia Chmielinski

Dataset Management Platform for Machine Learning

The quality of the data in a dataset can have a substantial impact on the performance of a machine learning model that is trained and/or evaluated using the dataset. Effective dataset management, including tasks such as data cleanup,…

Databases · Computer Science 2023-03-16 Ze Mao , Yang Xu , Erick Suarez

Data Quality Toolkit: Automatic assessment of data quality and remediation for machine learning datasets

The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Various tools and techniques are available that assess data quality with respect to general cleaning and profiling checks.…

Machine Learning · Computer Science 2021-09-07 Nitin Gupta , Hima Patel , Shazia Afzal , Naveen Panwar , Ruhi Sharma Mittal , Shanmukha Guttula , Abhinav Jain , Lokesh Nagalapatti , Sameep Mehta , Sandeep Hans , Pranay Lohia , Aniya Aggarwal , Diptikalyan Saha

Data Readiness for Natural Language Processing

This document concerns data readiness in the context of machine learning and Natural Language Processing. It describes how an organization may proceed to identify, make available, validate, and prepare data to facilitate automated analysis…

Computers and Society · Computer Science 2020-10-01 Fredrik Olsson , Magnus Sahlgren

Towards "all-inclusive" Data Preparation to ensure Data Quality

Data preparation, especially data cleaning, is very important to ensure data quality and to improve the output of automated decision systems. Since there is no single tool that covers all steps required, a combination of tools -- namely a…

Databases · Computer Science 2023-08-29 Valerie Restat

Standards in the Preparation of Biomedical Research Metadata: A Bridge2AI Perspective

AI-readiness describes the degree to which data may be optimally and ethically used for subsequent AI and Machine Learning (AI/ML) methods, where those methods may involve some combination of model training, data classification, and…

Other Quantitative Biology · Quantitative Biology 2025-09-18 Harry Caufield , Satrajit Ghosh , Sek Wong Kong , Jillian Parker , Nathan Sheffield , Bhavesh Patel , Andrew Williams , Timothy Clark , Monica C. Munoz-Torres

Data Readiness Levels

Application of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing…

Databases · Computer Science 2017-05-08 Neil D. Lawrence

Data Readiness for AI: A 360-Degree Survey

Artificial Intelligence (AI) applications critically depend on data. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the…

Machine Learning · Computer Science 2025-03-10 Kaveen Hiniduma , Suren Byna , Jean Luca Bez

Technical Report on Data Integration and Preparation

AI application developers typically begin with a dataset of interest and a vision of the end analytic or insight they wish to gain from the data at hand. Although these are two very important components of an AI workflow, one often spends…

Databases · Computer Science 2021-03-04 El Kindi Rezig , Michael Cafarella , Vijay Gadepally

AI Data Development: A Scorecard for the System Card Framework

Artificial intelligence has transformed numerous industries, from healthcare to finance, enhancing decision-making through automated systems. However, the reliability of these systems is mainly dependent on the quality of the underlying…

Computers and Society · Computer Science 2025-06-04 Tadesse K. Bahiru , Haileleol Tibebu , Ioannis A. Kakadiaris

AI Data Readiness Inspector (AIDRIN) for Quantitative Assessment of Data Readiness for AI

"Garbage In Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective.…

Artificial Intelligence · Computer Science 2025-03-12 Kaveen Hiniduma , Suren Byna , Jean Luca Bez , Ravi Madduri

A Survey on Data Cleaning Methods for Improved Machine Learning Model Performance

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

AI Competitions and Benchmarks: Dataset Development

Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even…

Machine Learning · Computer Science 2024-04-16 Romain Egele , Julio C. S. Jacques Junior , Jan N. van Rijn , Isabelle Guyon , Xavier Baró , Albert Clapés , Prasanna Balaprakash , Sergio Escalera , Thomas Moeslund , Jun Wan

Healthsheet: Development of a Transparency Artifact for Health Datasets

Machine learning (ML) approaches have demonstrated promising results in a wide range of healthcare applications. Data plays a crucial role in developing ML-based healthcare systems that directly affect people's lives. Many of the ethical…

Artificial Intelligence · Computer Science 2022-03-01 Negar Rostamzadeh , Diana Mincu , Subhrajit Roy , Andrew Smart , Lauren Wilcox , Mahima Pushkarna , Jessica Schrouff , Razvan Amironesei , Nyalleng Moorosi , Katherine Heller

Data Quality Assessment: Challenges and Opportunities

Data-oriented applications, their users, and even the law require data of high quality. Research has divided the rather vague notion of data quality into various dimensions, such as accuracy, consistency, and reputation. To achieve the goal…

Databases · Computer Science 2024-12-09 Sedir Mohammed , Lisa Ehrlinger , Hazar Harmouch , Felix Naumann , Divesh Srivastava

A Methodology for Creating AI FactSheets

As AI models and services are used in a growing number of highstakes areas, a consensus is forming around the need for a clearer record of how these models and services are developed to increase trust. Several proposals for higher quality…

Human-Computer Interaction · Computer Science 2020-06-30 John Richards , David Piorkowski , Michael Hind , Stephanie Houde , Aleksandra Mojsilović

Datasheets for Healthcare AI: A Framework for Transparency and Bias Mitigation

The use of AI in healthcare has the potential to improve patient care, optimize clinical workflows, and enhance decision-making. However, bias, data incompleteness, and inaccuracies in training datasets can lead to unfair outcomes and…

Computers and Society · Computer Science 2025-01-13 Marjia Siddik , Harshvardhan J. Pandit

On Responsible Machine Learning Datasets with Fairness, Privacy, and Regulatory Norms

Artificial Intelligence (AI) has made its way into various scientific fields, providing astonishing improvements over existing algorithms for a wide variety of tasks. In recent years, there have been severe concerns over the trustworthiness…

Machine Learning · Computer Science 2024-08-20 Surbhi Mittal , Kartik Thakral , Richa Singh , Mayank Vatsa , Tamar Glaser , Cristian Canton Ferrer , Tal Hassner

Datasheets for Datasets

The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics…

Databases · Computer Science 2021-12-03 Timnit Gebru , Jamie Morgenstern , Briana Vecchione , Jennifer Wortman Vaughan , Hanna Wallach , Hal Daumé , Kate Crawford