English
Related papers

Related papers: Datasheets for Datasets

200 papers

Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with…

Machine learning (ML) is becoming prevalent in embedded AI sensing systems. These "ML sensors" enable context-sensitive, real-time data collection and decision-making across diverse applications ranging from anomaly detection in industrial…

Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even…

Datasets have played a foundational role in the advancement of machine learning research. They form the basis for the models we design and deploy, as well as our primary medium for benchmarking and evaluation. Furthermore, the ways in which…

Machine Learning · Computer Science 2021-11-16 Amandalynne Paullada , Inioluwa Deborah Raji , Emily M. Bender , Emily Denton , Alex Hanna

As research and industry moves towards large-scale models capable of numerous downstream tasks, the complexity of understanding multi-modal datasets that give nuance to models rapidly increases. A clear and thorough understanding of a…

Human-Computer Interaction · Computer Science 2022-04-05 Mahima Pushkarna , Andrew Zaldivar , Oddur Kjartansson

Machine learning (ML) approaches have demonstrated promising results in a wide range of healthcare applications. Data plays a crucial role in developing ML-based healthcare systems that directly affect people's lives. Many of the ethical…

The use of AI in healthcare has the potential to improve patient care, optimize clinical workflows, and enhance decision-making. However, bias, data incompleteness, and inaccuracies in training datasets can lead to unfair outcomes and…

Computers and Society · Computer Science 2025-01-13 Marjia Siddik , Harshvardhan J. Pandit

The rapid development of network science and technologies depends on shareable datasets. Currently, there is no standard practice for reporting and sharing network datasets. Some network dataset providers only share links, while others…

Social and Information Networks · Computer Science 2022-06-09 Xinyi Zheng , Ryan A. Rossi , Nesreen Ahmed , Dominik Moritz

The quality of the data in a dataset can have a substantial impact on the performance of a machine learning model that is trained and/or evaluated using the dataset. Effective dataset management, including tasks such as data cleanup,…

Databases · Computer Science 2023-03-16 Ze Mao , Yang Xu , Erick Suarez

Data is a critical element in any discovery process. In the last decades, we observed exponential growth in the volume of available data and the technology to manipulate it. However, data is only practical when one can structure it for a…

Leaderboards are crucial in the machine learning (ML) domain for benchmarking and tracking progress. However, creating leaderboards traditionally demands significant manual effort. In recent years, efforts have been made to automate…

Machine Learning · Computer Science 2026-02-02 Roelien C. Timmer , Necva Bölücü , Stephen Wan

Feature selection is an important and active field of research in machine learning and data science. Our goal in this paper is to propose a collection of synthetic datasets that can be used as a common reference point for feature selection…

Machine Learning · Computer Science 2022-11-08 Firuz Kamalov , Hana Sulieman , Aswani Kumar Cherukuri

ML/AI is the field of computer science and computer engineering that arguably received the most attention and funding over the last decade. Data is the key element of ML/AI, so it is becoming increasingly important to ensure that users are…

Digital Libraries · Computer Science 2025-03-19 Marco Rondina , Antonio Vetrò , Juan Carlos De Martin

This document gives a set of recommendations to build and manipulate the datasets used to develop and/or validate machine learning models such as deep neural networks. This document is one of the 3 documents defined in [1] to ensure the…

Datasets of visualization play a crucial role in automating data-driven visualization pipelines, serving as the foundation for supervised model training and algorithm benchmarking. In this paper, we survey the literature on visualization…

Human-Computer Interaction · Computer Science 2024-07-24 Can Liu , Ruike Jiang , Shaocong Tan , Jiacheng Yu , Chaofan Yang , Hanning Shao , Xiaoru Yuan

Generating value from data requires the ability to find, access and make sense of datasets. There are many efforts underway to encourage data sharing and reuse, from scientific publishers asking authors to submit data alongside manuscripts…

Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of training state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density,…

Data is central to the development and evaluation of machine learning (ML) models. However, the use of problematic or inappropriate datasets can result in harms when the resulting models are deployed. To encourage responsible AI practice…

Human-Computer Interaction · Computer Science 2022-08-25 Amy K. Heger , Liz B. Marquis , Mihaela Vorvoreanu , Hanna Wallach , Jennifer Wortman Vaughan

Datasets are central to training machine learning (ML) models. The ML community has recently made significant improvements to data stewardship and documentation practices across the model development life cycle. However, the act of…

Computers and Society · Computer Science 2022-05-11 Alexandra Sasha Luccioni , Frances Corry , Hamsini Sridharan , Mike Ananny , Jason Schultz , Kate Crawford

Machine learning datasets are powerful but unwieldy. Despite the fact that large datasets commonly contain problematic material--whether from a technical, legal, or ethical perspective--datasets are valuable resources when handled carefully…

Computers and Society · Computer Science 2025-01-28 Sarah Ciston , Mike Ananny , Kate Crawford
‹ Prev 1 2 3 10 Next ›