English
Related papers

Related papers: Data Readiness Report

200 papers

As research and industry moves towards large-scale models capable of numerous downstream tasks, the complexity of understanding multi-modal datasets that give nuance to models rapidly increases. A clear and thorough understanding of a…

Human-Computer Interaction · Computer Science 2022-04-05 Mahima Pushkarna , Andrew Zaldivar , Oddur Kjartansson

Artificial intelligence (AI) systems built on incomplete or biased data will often exhibit problematic outcomes. Current methods of data analysis, particularly before model development, are costly and not standardized. The Dataset Nutrition…

Databases · Computer Science 2018-05-11 Sarah Holland , Ahmed Hosny , Sarah Newman , Joshua Joseph , Kasia Chmielinski

The quality of the data in a dataset can have a substantial impact on the performance of a machine learning model that is trained and/or evaluated using the dataset. Effective dataset management, including tasks such as data cleanup,…

Databases · Computer Science 2023-03-16 Ze Mao , Yang Xu , Erick Suarez

The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Various tools and techniques are available that assess data quality with respect to general cleaning and profiling checks.…

This document concerns data readiness in the context of machine learning and Natural Language Processing. It describes how an organization may proceed to identify, make available, validate, and prepare data to facilitate automated analysis…

Computers and Society · Computer Science 2020-10-01 Fredrik Olsson , Magnus Sahlgren

Data preparation, especially data cleaning, is very important to ensure data quality and to improve the output of automated decision systems. Since there is no single tool that covers all steps required, a combination of tools -- namely a…

Databases · Computer Science 2023-08-29 Valerie Restat

AI-readiness describes the degree to which data may be optimally and ethically used for subsequent AI and Machine Learning (AI/ML) methods, where those methods may involve some combination of model training, data classification, and…

Application of models to data is fraught. Data-generating collaborators often only have a very basic understanding of the complications of collating, processing and curating data. Challenges include: poor data collection practices, missing…

Databases · Computer Science 2017-05-08 Neil D. Lawrence

Artificial Intelligence (AI) applications critically depend on data. Poor quality data produces inaccurate and ineffective AI models that may lead to incorrect or unsafe use. Evaluation of data readiness is a crucial step in improving the…

Machine Learning · Computer Science 2025-03-10 Kaveen Hiniduma , Suren Byna , Jean Luca Bez

AI application developers typically begin with a dataset of interest and a vision of the end analytic or insight they wish to gain from the data at hand. Although these are two very important components of an AI workflow, one often spends…

Databases · Computer Science 2021-03-04 El Kindi Rezig , Michael Cafarella , Vijay Gadepally

Artificial intelligence has transformed numerous industries, from healthcare to finance, enhancing decision-making through automated systems. However, the reliability of these systems is mainly dependent on the quality of the underlying…

Computers and Society · Computer Science 2025-06-04 Tadesse K. Bahiru , Haileleol Tibebu , Ioannis A. Kakadiaris

"Garbage In Garbage Out" is a universally agreed quote by computer scientists from various domains, including Artificial Intelligence (AI). As data is the fuel for AI, models trained on low-quality, biased data are often ineffective.…

Artificial Intelligence · Computer Science 2025-03-12 Kaveen Hiniduma , Suren Byna , Jean Luca Bez , Ravi Madduri

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even…

Machine learning (ML) approaches have demonstrated promising results in a wide range of healthcare applications. Data plays a crucial role in developing ML-based healthcare systems that directly affect people's lives. Many of the ethical…

Data-oriented applications, their users, and even the law require data of high quality. Research has divided the rather vague notion of data quality into various dimensions, such as accuracy, consistency, and reputation. To achieve the goal…

Databases · Computer Science 2024-12-09 Sedir Mohammed , Lisa Ehrlinger , Hazar Harmouch , Felix Naumann , Divesh Srivastava

As AI models and services are used in a growing number of highstakes areas, a consensus is forming around the need for a clearer record of how these models and services are developed to increase trust. Several proposals for higher quality…

Human-Computer Interaction · Computer Science 2020-06-30 John Richards , David Piorkowski , Michael Hind , Stephanie Houde , Aleksandra Mojsilović

The use of AI in healthcare has the potential to improve patient care, optimize clinical workflows, and enhance decision-making. However, bias, data incompleteness, and inaccuracies in training datasets can lead to unfair outcomes and…

Computers and Society · Computer Science 2025-01-13 Marjia Siddik , Harshvardhan J. Pandit

Artificial Intelligence (AI) has made its way into various scientific fields, providing astonishing improvements over existing algorithms for a wide variety of tasks. In recent years, there have been severe concerns over the trustworthiness…

Machine Learning · Computer Science 2024-08-20 Surbhi Mittal , Kartik Thakral , Richa Singh , Mayank Vatsa , Tamar Glaser , Cristian Canton Ferrer , Tal Hassner

The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics…

‹ Prev 1 2 3 10 Next ›