Related papers: Characterizing instance hardness in classification…
Defect prediction is crucial for software quality assurance and has been extensively researched over recent decades. However, prior studies rarely focus on data complexity in defect prediction tasks, and even less on understanding the…
For building successful Machine Learning (ML) systems, it is imperative to have high quality data and well tuned learning models. But how can one assess the quality of a given dataset? And how can the strengths and weaknesses of a model on…
Complex classifiers may exhibit "embarassing" failures in cases where humans can easily provide a justified classification. Avoiding such failures is obviously of key importance. In this work, we focus on one such setting, where a label is…
Despite the tremendous advances in machine learning (ML), training with imbalanced data still poses challenges in many real-world applications. Among a series of diverse techniques to solve this problem, sampling algorithms are regarded as…
Testing Machine Learning (ML) models and AI-Infused Applications (AIIAs), or systems that contain ML models, is highly challenging. In addition to the challenges of testing classical software, it is acceptable and expected that statistical…
Traditional instance-based model analysis focuses mainly on misclassified instances. However, this approach overlooks the varying difficulty associated with different instances. Ideally, a robust model should recognize and reflect the…
Data pruning, or instance selection, is an important problem in machine learning especially in terms of nearest neighbour classifier. However, in data pruning which speeds up the prediction phase, there is an issue related to the speed and…
As learning difficulty is crucial for machine learning (e.g., difficulty-based weighting learning strategies), previous literature has proposed a number of learning difficulty measures. However, no comprehensive investigation for learning…
Extracting noisy or incorrectly labeled samples from a labeled dataset with hard/difficult samples is an important yet under-explored topic. Two general and often independent lines of work exist, one focuses on addressing noisy labels, and…
Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors occur. However, understanding the…
Long-tailed datasets, where head classes comprise much more training samples than tail classes, cause recognition models to get biased towards the head classes. Weighted loss is one of the most popular ways of mitigating this issue, and a…
Adversarial examples are a type of attack on machine learning (ML) systems which cause misclassification of inputs. Achieving robustness against adversarial examples is crucial to apply ML in the real world. While most prior work on…
High-quality data is a key aspect of modern machine learning. However, labels generated by humans suffer from issues like label noise and class ambiguities. We raise the question of whether hard labels are sufficient to represent the…
Motivated by the goals of dataset pruning and defect identification, a growing body of methods have been developed to score individual examples within a dataset. These methods, which we call "example difficulty scores", are typically used…
Existing work on understanding deep learning often employs measures that compress all data-dependent information into a few numbers. In this work, we adopt a perspective based on the role of individual examples. We introduce a measure of…
Malware classification is a difficult problem, to which machine learning methods have been applied for decades. Yet progress has often been slow, in part due to a number of unique difficulties with the task that occur through all stages of…
Datasets typically contain inaccuracies due to human error and societal biases, and these inaccuracies can affect the outcomes of models trained on such datasets. We present a technique for certifying whether linear regression models are…
In this paper, we consider the instance segmentation task on a long-tailed dataset, which contains label noise, i.e., some of the annotations are incorrect. There are two main reasons making this case realistic. First, datasets collected…
A key requirement for supervised machine learning is labeled training data, which is created by annotating unlabeled data with the appropriate class. Because this process can in many cases not be done by machines, labeling needs to be…
In the AutoML domain, test accuracy is heralded as the quintessential metric for evaluating model efficacy, underpinning a wide array of applications from neural architecture search to hyperparameter optimization. However, the reliability…