English
Related papers

Related papers: Data Quality Evaluation using Probability Models

200 papers

In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels,…

Machine Learning · Computer Science 2023-06-28 Szymon Mazurek , Maciej Wielgosz

Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…

Machine Learning · Computer Science 2023-12-14 Jouseau Roxane , Salva Sébastien , Samir Chafik

Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing…

Artificial Intelligence · Computer Science 2018-10-17 Wei Dai , Kenji Yoshigoe , William Parsley

Developing machine learning models can be seen as a process similar to the one established for traditional software development. A key difference between the two lies in the strong dependency between the quality of a machine learning model…

Machine Learning · Computer Science 2021-02-17 Cedric Renggli , Luka Rimanic , Nezihe Merve Gürel , Bojan Karlaš , Wentao Wu , Ce Zhang

A common assumption exists according to which machine learning models improve their performance when they have more data to learn from. In this study, the authors wished to clarify the dilemma by performing an empirical experiment utilizing…

Machine Learning · Computer Science 2021-12-20 Antti Kariluoto , Arto Pärnänen , Joni Kultanen , Jukka Soininen , Pekka Abrahamsson

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many…

Machine Learning · Computer Science 2021-01-06 Hyeongmin Cho , Sangkyun Lee

Modern artificial intelligence (AI) applications require large quantities of training and test data. This need creates critical challenges not only concerning the availability of such data, but also regarding its quality. For example,…

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry

Machine Learning (ML) models are being increasingly employed for credit risk evaluation, with their effectiveness largely hinging on the quality of the input data. In this paper we investigate the impact of several data quality issues,…

Machine Learning · Computer Science 2025-11-18 Andrea Maurino

Quality estimation aims to measure the quality of translated content without access to a reference translation. This is crucial for machine translation systems in real-world scenarios where high-quality translation is needed. While many…

Computation and Language · Computer Science 2021-02-09 Yi-Lin Tuan , Ahmed El-Kishky , Adithya Renduchintala , Vishrav Chaudhary , Francisco Guzmán , Lucia Specia

Poor data quality limits the advantageous power of Machine Learning (ML) and weakens high-performing ML software systems. Nowadays, data are more prone to the risk of poor quality due to their increasing volume and complexity. Therefore,…

Machine Learning · Computer Science 2025-02-20 Manal Rahal , Bestoun S. Ahmed , Gergely Szabados , Torgny Fornstedt , Jorgen Samuelsson

Data is of high quality if it is fit for its intended use. The quality of data is influenced by the underlying data model and its quality. One major quality problem is the heterogeneity of data as quality aspects such as understandability…

Machine Learning · Computer Science 2021-11-15 Viola Wenz , Arno Kesper , Gabriele Taentzer

Data-driven forecasts of air quality have recently achieved more accurate short-term predictions. Despite their success, most of the current data-driven solutions lack proper quantifications of model uncertainty that communicate how much to…

Machine Learning · Computer Science 2021-12-07 Abdulmajid Murad , Frank Alexander Kraemer , Kerstin Bach , Gavin Taylor

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a…

Machine Learning · Computer Science 2020-05-13 Netanel Raviv , Siddharth Jain , Jehoshua Bruck

Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments. Such scenarios can be improved if quality predictions are accompanied by a measure of uncertainty.…

Computation and Language · Computer Science 2016-07-01 Daniel Beck , Lucia Specia , Trevor Cohn

Classifying samples in incomplete datasets is a common aim for machine learning practitioners, but is non-trivial. Missing data is found in most real-world datasets and these missing values are typically imputed using established methods,…

Nowadays, people strive to improve the accuracy of deep learning models. However, very little work has focused on the quality of data sets. In fact, data quality determines model quality. Therefore, it is important for us to make research…

Machine Learning · Computer Science 2019-07-01 Tianxing He , Shengcheng Yu , Ziyuan Wang , Jieqiong Li , Zhenyu Chen

Machine Learning approaches are good in solving problems that have less information. In most cases, the software domain problems characterize as a process of learning that depend on the various circumstances and changes accordingly. A…

Software Engineering · Computer Science 2015-06-26 Saiqa Aleem , Luiz Fernando Capretz , Faheem Ahmed

Dataset pruning is the process of removing sub-optimal tuples from a dataset to improve the learning of a machine learning model. In this paper, we compared the performance of different algorithms, first on an unpruned dataset and then on…

Machine Learning · Computer Science 2019-01-31 Arun Thundyill Saseendran , Lovish Setia , Viren Chhabria , Debrup Chakraborty , Aneek Barman Roy

Two indicators are classically used to evaluate the quality of rule-based classification systems: predictive accuracy, i.e. the system's ability to successfully reproduce learning data and coverage, i.e. the proportion of possible cases for…

Artificial Intelligence · Computer Science 2020-04-07 Nassim Dehouche
‹ Prev 1 2 3 10 Next ›