English
Related papers

Related papers: Dataset Management Platform for Machine Learning

200 papers

Machine learning is now used in many applications thanks to its ability to predict, generate, or discover patterns from large quantities of data. However, the process of collecting and transforming data for practical use is intricate. Even…

Datasets have played a foundational role in the advancement of machine learning research. They form the basis for the models we design and deploy, as well as our primary medium for benchmarking and evaluation. Furthermore, the ways in which…

Machine Learning · Computer Science 2021-11-16 Amandalynne Paullada , Inioluwa Deborah Raji , Emily M. Bender , Emily Denton , Alex Hanna

Dataset distillation has emerged as a strategy to overcome the hurdles associated with large datasets by learning a compact set of synthetic data that retains essential information from the original dataset. While distilled data can be used…

Machine Learning · Computer Science 2024-07-23 William Yang , Ye Zhu , Zhiwei Deng , Olga Russakovsky

This document gives a set of recommendations to build and manipulate the datasets used to develop and/or validate machine learning models such as deep neural networks. This document is one of the 3 documents defined in [1] to ensure the…

Data exploration and quality analysis is an important yet tedious process in the AI pipeline. Current practices of data cleaning and data readiness assessment for machine learning tasks are mostly conducted in an arbitrary manner which…

Databases · Computer Science 2020-10-16 Shazia Afzal , Rajmohan C , Manish Kesarwani , Sameep Mehta , Hima Patel

In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels,…

Machine Learning · Computer Science 2023-06-28 Szymon Mazurek , Maciej Wielgosz

Camera images are ubiquitous in machine learning research. They also play a central role in the delivery of important services spanning medicine and environmental surveying. However, the application of machine learning models in these…

Data-centric AI is at the center of a fundamental shift in software engineering where machine learning becomes the new software, powered by big data and computing infrastructure. Here software engineering needs to be re-thought where data…

Machine Learning · Computer Science 2022-12-27 Steven Euijong Whang , Yuji Roh , Hwanjun Song , Jae-Gil Lee

The machine learning community currently has no standardized process for documenting datasets, which can lead to severe consequences in high-stakes domains. To address this gap, we propose datasheets for datasets. In the electronics…

Imitation learning from large multi-task demonstration datasets has emerged as a promising path for building generally-capable robots. As a result, 1000s of hours have been spent on building such large-scale datasets around the globe.…

Machine learning datasets are powerful but unwieldy. Despite the fact that large datasets commonly contain problematic material--whether from a technical, legal, or ethical perspective--datasets are valuable resources when handled carefully…

Computers and Society · Computer Science 2025-01-28 Sarah Ciston , Mike Ananny , Kate Crawford

As machine learning systems grow in scale, so do their training data requirements, forcing practitioners to automate and outsource the curation of training data in order to achieve state-of-the-art performance. The absence of trustworthy…

Machine Learning · Computer Science 2021-04-02 Micah Goldblum , Dimitris Tsipras , Chulin Xie , Xinyun Chen , Avi Schwarzschild , Dawn Song , Aleksander Madry , Bo Li , Tom Goldstein

The performance of machine learning models relies heavily on the quality of input data, yet real-world applications often face significant data-related challenges. A common issue arises when curating training data or deploying models: two…

Machine Learning · Computer Science 2025-09-24 Varun Babbar , Zhicheng Guo , Cynthia Rudin

Rising concern for the societal implications of artificial intelligence systems has inspired demands for greater transparency and accountability. However the datasets which empower machine learning are often used, shared and re-used with…

The quality of underlying training data is very crucial for building performant machine learning models with wider generalizabilty. However, current machine learning (ML) tools lack streamlined processes for improving the data quality. So,…

Machine Learning · Computer Science 2021-12-16 Atindriyo Sanyal , Vikram Chatterji , Nidhi Vyas , Ben Epstein , Nikita Demir , Anthony Corletti

Data cleaning is the initial stage of any machine learning project and is one of the most critical processes in data analysis. It is a critical step in ensuring that the dataset is devoid of incorrect or erroneous data. It can be done…

Databases · Computer Science 2021-09-16 Ga Young Lee , Lubna Alzamil , Bakhtiyar Doskenov , Arash Termehchy

Machine learning inference is increasingly being executed locally on mobile and embedded platforms, due to the clear advantages in latency, privacy and connectivity. In this paper, we present approaches for online resource management in…

Computer Vision and Pattern Recognition · Computer Science 2021-05-11 Lei Xun , Long Tran-Thanh , Bashir M Al-Hashimi , Geoff V. Merrett

Data mining is about obtaining new knowledge from existing datasets. However, the data in the existing datasets can be scattered, noisy, and even incomplete. Although lots of effort is spent on developing or fine-tuning data mining models…

Machine Learning · Computer Science 2019-06-21 Canchen Li

Workflow technology is rapidly evolving and, rather than being limited to modeling the control flow in business processes, is becoming a key mechanism to perform advanced data management, such as big data analytics. This survey focuses on…

Databases · Computer Science 2017-01-27 Georgia Kougka , Anastasios Gounaris , Alkis Simitsis

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many…

Machine Learning · Computer Science 2021-01-06 Hyeongmin Cho , Sangkyun Lee
‹ Prev 1 2 3 10 Next ›