Related papers: A classification performance evaluation measure co…

Data Separability for Neural Network Classifiers and the Development of a Separability Index

In machine learning, the performance of a classifier depends on both the classifier model and the dataset. For a specific neural network classifier, the training process varies with the training set used; some training data make training…

Machine Learning · Computer Science 2020-06-01 Shuyue Guan , Murray Loew , Hanseok Ko

A Novel Intrinsic Measure of Data Separability

In machine learning, the performance of a classifier depends on both the classifier model and the separability/complexity of datasets. To quantitatively measure the separability of datasets, we create an intrinsic measure -- the…

Machine Learning · Computer Science 2021-09-14 Shuyue Guan , Murray Loew

A Novel Metric for Measuring Data Quality in Classification Applications (extended version)

Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…

Machine Learning · Computer Science 2023-12-14 Jouseau Roxane , Salva Sébastien , Samir Chafik

Measuring Discrimination to Boost Comparative Testing for Multiple Deep Learning Models

The boom of DL technology leads to massive DL models built and shared, which facilitates the acquisition and reuse of DL models. For a given task, we encounter multiple DL models available with the same functionality, which are considered…

Software Engineering · Computer Science 2021-03-10 Linghan Meng , Yanhui Li , Lin Chen , Zhi Wang , Di Wu , Yuming Zhou , Baowen Xu

Data Quality Measures and Efficient Evaluation Algorithms for Large-Scale High-Dimensional Data

Machine learning has been proven to be effective in various application areas, such as object and speech recognition on mobile systems. Since a critical key to machine learning success is the availability of large training data, many…

Machine Learning · Computer Science 2021-01-06 Hyeongmin Cho , Sangkyun Lee

A Comprehensive Assessment Benchmark for Rigorously Evaluating Deep Learning Image Classifiers

Reliable and robust evaluation methods are a necessary first step towards developing machine learning models that are themselves robust and reliable. Unfortunately, current evaluation protocols typically used to assess classifiers fail to…

Machine Learning · Computer Science 2025-05-26 Michael W. Spratling

MDAS: A Diagnostic Approach to Assess the Quality of Data Splitting in Machine Learning

In the field of machine learning, model performance is usually assessed by randomly splitting data into training and test sets. Different random splits, however, can yield markedly different performance estimates, so a genuinely good model…

Computation · Statistics 2026-01-09 Palash Ghosh , Bittu Karmakar , Eklavya Jain , J. Neeraja , Buddhananda Banerjee , Tanujit Chakraborty

Comparative Separation: Evaluating Separation on Comparative Judgment Test Data

This research seeks to benefit the software engineering society by proposing comparative separation, a novel group fairness notion to evaluate the fairness of machine learning software on comparative judgment test data. Fairness issues have…

Software Engineering · Computer Science 2026-01-13 Xiaoyin Xi , Neeku Capak , Kate Stockwell , Zhe Yu

The Data Representativeness Criterion: Predicting the Performance of Supervised Classification Based on Data Set Similarity

In a broad range of fields it may be desirable to reuse a supervised classification algorithm and apply it to a new data set. However, generalization of such an algorithm and thus achieving a similar classification performance is only…

Computer Vision and Pattern Recognition · Computer Science 2020-08-13 Evelien Schat , Rens van de Schoot , Wouter M. Kouw , Duco Veen , Adriënne M. Mendrik

Learning similarity measures from data

Defining similarity measures is a requirement for some machine learning methods. One such method is case-based reasoning (CBR) where the similarity measure is used to retrieve the stored case or set of cases most similar to the query case.…

Machine Learning · Computer Science 2020-01-16 Bjørn Magnus Mathisen , Agnar Aamodt , Kerstin Bach , Helge Langseth

Accuracy Measures for the Comparison of Classifiers

The selection of the best classification algorithm for a given dataset is a very widespread problem. It is also a complex one, in the sense it requires to make several important methodological choices. Among them, in this work we focus on…

Machine Learning · Computer Science 2012-07-18 Vincent Labatut , Hocine Cherifi

DCSI -- An improved measure of cluster separability based on separation and connectedness

Whether class labels in a given data set correspond to meaningful clusters is crucial for the evaluation of clustering algorithms using real-world data sets. This property can be quantified by separability measures. The central aspects of…

Machine Learning · Statistics 2025-04-11 Jana Gauss , Fabian Scheipl , Moritz Herrmann

Statistical Analysis of Data Repeatability Measures

The advent of modern data collection and processing techniques has seen the size, scale, and complexity of data grow exponentially. A seminal step in leveraging these rich datasets for downstream inference is understanding the…

Applications · Statistics 2024-07-30 Zeyi Wang , Eric Bridgeford , Shangsi Wang , Joshua T. Vogelstein , Brian Caffo

Quantifying Data Similarity Using Cross Learning

Measuring dataset similarity is fundamental in machine learning, particularly for transfer learning and domain adaptation. In the context of supervised learning, most existing approaches quantify similarity of two data sets based on their…

Machine Learning · Statistics 2026-04-22 Shudong Sun , Hao Helen Zhang , Joseph C Watkins

Rethinking Semi-supervised Segmentation Beyond Accuracy: Reliability and Robustness

Semantic segmentation is critical for scene understanding but demands costly pixel-wise annotations, attracting increasing attention to semi-supervised approaches to leverage abundant unlabeled data. While semi-supervised segmentation is…

Computer Vision and Pattern Recognition · Computer Science 2025-06-09 Steven Landgraf , Markus Hillemann , Markus Ulrich

Utilizing Class Separation Distance for the Evaluation of Corruption Robustness of Machine Learning Classifiers

Robustness is a fundamental pillar of Machine Learning (ML) classifiers, substantially determining their reliability. Methods for assessing classifier robustness are therefore essential. In this work, we address the challenge of evaluating…

Machine Learning · Computer Science 2026-01-19 Georg Siedel , Silvia Vock , Andrey Morozov , Stefan Voß

Selecting a classification performance measure: matching the measure to the problem

The problem of identifying to which of a given set of classes objects belong is ubiquitous, occurring in many research domains and application areas, including medical diagnosis, financial decision making, online commerce, and national…

Machine Learning · Computer Science 2024-09-20 David J. Hand , Peter Christen , Sumayya Ziyad

Measuring the Sensitivity of Classification Models with the Error Sensitivity Profile

The quality of training data is critical to the performance of machine learning models. In this paper, the Error Sensitivity Profile (ESP) is proposed. It quantifies the sensitivity of model performance to errors in a single feature or in…

Machine Learning · Computer Science 2026-04-29 Andrea Maurino

A Supervised Learning Approach to Rankability

The rankability of data is a recently proposed problem that considers the ability of a dataset, represented as a graph, to produce a meaningful ranking of the items it contains. To study this concept, a number of rankability measures have…

Combinatorics · Mathematics 2022-03-15 Nathan McJames , David Malone , Oliver Mason

Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data

We introduce a discriminative regression approach to supervised classification in this paper. It estimates a representation model while accounting for discriminativeness between classes, thereby enabling accurate derivation of categorical…

Machine Learning · Computer Science 2020-01-01 Chong Peng , Qiang Cheng