English
Related papers

Related papers: Statistical Dataset Evaluation: Reliability, Diffi…

200 papers

This paper focuses on numeric data, with emphasis on distinct characteristics like varying significance, unstructured format, mass volume and real-time processing. We propose a novel, context-dependent valuation framework specifically…

Databases · Computer Science 2018-10-23 Milen S. Marev , Ernesto Compatangelo , Wamberto Vasconcelos

The use of learning-based techniques to achieve automated software vulnerability detection has been of longstanding interest within the software security domain. These data-driven solutions are enabled by large software vulnerability…

Software Engineering · Computer Science 2023-01-16 Roland Croft , M. Ali Babar , Mehdi Kholoosi

In this paper, we delve into the critical aspect of dataset quality assessment in machine learning classification tasks. Leveraging a variety of nine distinct datasets, each crafted for classification tasks with varying complexity levels,…

Machine Learning · Computer Science 2023-06-28 Szymon Mazurek , Maciej Wielgosz

Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…

Machine Learning · Computer Science 2023-12-14 Jouseau Roxane , Salva Sébastien , Samir Chafik

Devising domain- and model-agnostic evaluation metrics for generative models is an important and as yet unresolved problem. Most existing metrics, which were tailored solely to the image synthesis setup, exhibit a limited capacity for…

Machine Learning · Computer Science 2022-07-14 Ahmed M. Alaa , Boris van Breugel , Evgeny Saveliev , Mihaela van der Schaar

Data is expanding at an unimaginable rate, and with this development comes the responsibility of the quality of data. Data Quality refers to the relevance of the information present and helps in various operations like decision making and…

Machine Learning · Computer Science 2021-11-30 Sezal Chug , Priya Kaushal , Ponnurangam Kumaraguru , Tavpritesh Sethi

A fundamental problem in the practice and teaching of data science is how to evaluate the quality of a given data analysis, which is different than the evaluation of the science or question underlying the data analysis. Previously, we…

Other Statistics · Statistics 2019-04-29 Stephanie C. Hicks , Roger D. Peng

In the era of big data, ensuring the quality of datasets has become increasingly crucial across various domains. We propose a comprehensive framework designed to automatically assess and rectify data quality issues in any given dataset,…

Databases · Computer Science 2024-09-17 Djibril Sarr

Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location…

Software Engineering · Computer Science 2020-12-22 Michael F. Bosu , Stephen G. MacDonell

Vulnerability detection is a crucial yet challenging task to identify potential weaknesses in software for cyber security. Recently, deep learning (DL) has made great progress in automating the detection process. Due to the complex…

Cryptography and Security · Computer Science 2024-10-10 Yuejun Guo , Seifeddine Bettaieb

Trajectory datasets of road users have become more important in the last years for safety validation of highly automated vehicles. Several naturalistic trajectory datasets with each more than 10.000 tracks were released and others will…

Computer Vision and Pattern Recognition · Computer Science 2022-04-12 Christoph Glasmacher , Robert Krajewski , Lutz Eckstein

Autonomous driving has rapidly developed and shown promising performance due to recent advances in hardware and deep learning techniques. High-quality datasets are fundamental for developing reliable autonomous driving algorithms. Previous…

Computer Vision and Pattern Recognition · Computer Science 2024-04-24 Mingyu Liu , Ekim Yurtsever , Jonathan Fossaert , Xingcheng Zhou , Walter Zimmer , Yuning Cui , Bare Luka Zagar , Alois C. Knoll

Context: The utility of prediction models in empirical software engineering (ESE) is heavily reliant on the quality of the data used in building those models. Several data quality challenges such as noise, incompleteness, outliers and…

Software Engineering · Computer Science 2021-05-25 Michael Franklin Bosu , Stephen G. MacDonell

Artificial neural networks (NN) are instrumental in realizing highly-automated driving functionality. An overarching challenge is to identify best safety engineering practices for NN and other learning-enabled components. In particular,…

Machine Learning · Computer Science 2018-06-11 Chih-Hong Cheng , Georg Nührenberg , Chung-Hao Huang , Harald Ruess , Hirotoshi Yasuoka

Noise plagues many numerical datasets, where the recorded values in the data may fail to match the true underlying values due to reasons including: erroneous sensors, data entry/processing mistakes, or imperfect human estimates. We consider…

Machine Learning · Statistics 2024-03-14 Hang Zhou , Jonas Mueller , Mayank Kumar , Jane-Ling Wang , Jing Lei

Traditional metrics like accuracy, F1-score, and precision are frequently used to evaluate machine learning models, however they may not be sufficient for evaluating performance on tiny, unbalanced, or high-dimensional datasets. A…

Machine Learning · Computer Science 2024-12-11 Serzhan Ossenov

Deep Neural Networks (DNNs), with its promising performance, are being increasingly used in safety critical applications such as autonomous driving, cancer detection, and secure authentication. With growing importance in deep learning,…

Machine Learning · Computer Science 2019-11-19 Senthil Mani , Anush Sankaran , Srikanth Tamilselvam , Akshay Sethi

Traditional data quality control methods are based on users experience or previously established business rules, and this limits performance in addition to being a very time consuming process with lower than desirable accuracy. Utilizing…

Artificial Intelligence · Computer Science 2018-10-17 Wei Dai , Kenji Yoshigoe , William Parsley

High-quality datasets are fundamental to training and evaluating machine learning models, yet their creation-especially with accurate human annotations-remains a significant challenge. Many dataset paper submissions lack originality,…

The quality of underlying training data is very crucial for building performant machine learning models with wider generalizabilty. However, current machine learning (ML) tools lack streamlined processes for improving the data quality. So,…

Machine Learning · Computer Science 2021-12-16 Atindriyo Sanyal , Vikram Chatterji , Nidhi Vyas , Ben Epstein , Nikita Demir , Anthony Corletti
‹ Prev 1 2 3 10 Next ›