English
Related papers

Related papers: Data Quality Measures and Efficient Evaluation Alg…

200 papers

Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…

Machine Learning · Computer Science 2023-12-14 Jouseau Roxane , Salva Sébastien , Samir Chafik

Data-oriented applications, their users, and even the law require data of high quality. Research has divided the rather vague notion of data quality into various dimensions, such as accuracy, consistency, and reputation. To achieve the goal…

Databases · Computer Science 2024-12-09 Sedir Mohammed , Lisa Ehrlinger , Hazar Harmouch , Felix Naumann , Divesh Srivastava

Modern computer vision foundation models are trained on massive amounts of data, incurring large economic and environmental costs. Recent research has suggested that improving data quality can significantly reduce the need for data…

Computer Vision and Pattern Recognition · Computer Science 2023-11-08 Benjamin Feuer , Chinmay Hegde

Data quality describes the degree to which data meet specific requirements and are fit for use by humans and/or downstream tasks (e.g., artificial intelligence). Data quality can be assessed across multiple high-level concepts called…

Databases · Computer Science 2025-07-24 Vasileios Papastergios , Lisa Ehrlinger , Anastasios Gounaris

Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing…

Artificial Intelligence · Computer Science 2018-10-17 Wei Dai , Kenji Yoshigoe , William Parsley

We identify the task of measuring data to quantitatively characterize the composition of machine learning data and datasets. Similar to an object's height, width, and volume, data measurements quantify different attributes of data along…

Data quality is a significant issue for any application that requests for analytics to support decision making. It becomes very important when we focus on Internet of Things (IoT) where numerous devices can interact to exchange and process…

Machine Learning · Computer Science 2020-07-30 Anna Karanika , Panagiotis Oikonomou , Kostas Kolomvatsos , Christos Anagnostopoulos

In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels,…

Machine Learning · Computer Science 2023-06-28 Szymon Mazurek , Maciej Wielgosz

High-quality data is key to interpretable and trustworthy data analytics and the basis for meaningful data-driven decisions. In practical scenarios, data quality is typically associated with data preprocessing, profiling, and cleansing for…

Databases · Computer Science 2019-07-19 Lisa Ehrlinger , Elisa Rusz , Wolfram Wöß

Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example,…

We provide a definition for class density that can be used to measure the aggregate similarity of the samples within each of the classes in a high-dimensional, unstructured dataset. We then put forth several candidate methods for…

Machine Learning · Computer Science 2022-02-09 Adam Byerly , Tatiana Kalganova

Poor data quality limits the advantageous power of Machine Learning (ML) and weakens high-performing ML software systems. Nowadays, data are more prone to the risk of poor quality due to their increasing volume and complexity. Therefore,…

Machine Learning · Computer Science 2025-02-20 Manal Rahal , Bestoun S. Ahmed , Gergely Szabados , Torgny Fornstedt , Jorgen Samuelsson

This paper discusses an approach with machine-learning probability models to evaluate the difference between good and bad data quality in a dataset. A decision tree algorithm is used to predict data quality based on no domain knowledge of…

Machine Learning · Computer Science 2020-09-16 Allen ONeill

Machine learning and deep learning classification models are data-driven, and the model and the data jointly determine their classification performance. It is biased to evaluate the model's performance only based on the classifier accuracy…

Machine Learning · Computer Science 2022-11-11 Lingyan Xue , Xinyu Zhang , Weidong Jiang , Kai Huo

In any knowledge discovery process the value of extracted knowledge is directly related to the quality of the data used. Big Data problems, generated by massive growth in the scale of data observed in recent years, also follow the same…

Databases · Computer Science 2017-07-31 Diego García-Gil , Julián Luengo , Salvador García , Francisco Herrera

The increasing size and availability of web data make data quality a core challenge in many applications. Principles of data quality are recognized as essential to ensure that data fit for their intended use in operations, decision-making,…

Digital Libraries · Computer Science 2013-05-20 Ahmad Assaf , Aline Senart

Data is expanding at an unimaginable rate, and with this development comes the responsibility of the quality of data. Data Quality refers to the relevance of the information present and helps in various operations like decision making and…

Machine Learning · Computer Science 2021-11-30 Sezal Chug , Priya Kaushal , Ponnurangam Kumaraguru , Tavpritesh Sethi

One of the most significant problems of Big Data is to extract knowledge through the huge amount of data. The usefulness of the extracted information depends strongly on data quality. In addition to the importance, data quality has recently…

Databases · Computer Science 2020-05-25 Mostafa Mirzaie , Behshid Behkamal , Samad Paydar

The research discusses how (open) data quality could be described, what should be considered developing a data quality management solution and how it could be applied to open data to check its quality. The proposed approach focuses on…

Databases · Computer Science 2022-06-16 Anastasija Nikiforova

The quality of the data in a dataset can have a substantial impact on the performance of a machine learning model that is trained and/or evaluated using the dataset. Effective dataset management, including tasks such as data cleanup,…

Databases · Computer Science 2023-03-16 Ze Mao , Yang Xu , Erick Suarez
‹ Prev 1 2 3 10 Next ›