Related papers: Pearson-Matthews correlation coefficients for bina…
Several performance measures are used to evaluate binary and multiclass classification tasks. But individual observations may often have distinct weights, and none of these measures are sensitive to such varying weights. We propose a new…
Classification problems are essential statistical tasks that form the foundation of decision-making across various fields, including patient prognosis and treatment strategies for critical conditions. Consequently, evaluating the…
Evaluating classifications is crucial in statistics and machine learning, as it influences decision-making across various fields, such as patient prognosis and therapy in critical conditions. The Matthews correlation coefficient (MCC) is…
Pearson's correlation is an important summary measure of the amount of dependence between two variables. It is natural to want to generalise the concept of correlation as a single number that measures the inter-relatedness of three or more…
Not a matter of serious contention, Pearson's correlation coefficient is still the most important statistical association measure. Restricted to just two variables, this measure sometimes doesn't live up to users' needs and expectations.…
Context: There is considerable diversity in the range and design of computational experiments to assess classifiers for software defect prediction. This is particularly so, regarding the choice of classifier performance metrics.…
Inferring linear relationships lies at the heart of many empirical investigations. A measure of linear dependence should correctly evaluate the strength of the relationship as well as qualify whether it is meaningful for the population.…
In the last few years, many different performance measures have been introduced to overcome the weakness of the most natural metric, the Accuracy. Among them, Matthews Correlation Coefficient has recently gained popularity among researchers…
Metrics for measuring the comparability of corpora or texts need to be developed and evaluated systematically. Applications based on a corpus, such as training Statistical MT systems in specialised narrow domains, require finding a…
Classification tasks in machine learning involving more than two classes are known by the name of "multi-class classification". Performance indicators are very useful when the aim is to evaluate and compare different classification models…
Similarity distance measure between two trajectories is an essential tool to understand patterns in motion, for example, in Human-Robot Interaction or Imitation Learning. The problem has been faced in many fields, from Signal Processing,…
The maximal correlation coefficient is a well-established generalization of the Pearson correlation coefficient for measuring non-linear dependence between random variables. It is appealing from a theoretical standpoint, satisfying…
Besides the classical distinction of correlation and dependence, many dependence measures bear further pitfalls in their application and interpretation. The aim of this paper is to raise and recall awareness of some of these limitations by…
The adequate use of information measured in a continuous manner along a period of time represents a methodological challenge. In the last decades, most of traditional statistical procedures have been extended for accommodating these…
A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced…
Multivariate correlation analysis plays an important role in various fields such as statistics, economics, and big data analytics. In this paper, we propose a pair of measures, the unsigned correlation coefficient (UCC) and the unsigned…
Relations between categorical variables can be analyzed conveniently by multiple correspondence analysis (MCA). %It is well suited to discover relations that may exist between categories of different variables. The graphical representation…
Pearson's $\rho$ is the most used measure of statistical dependence. It gives a complete characterization of dependence in the Gaussian case, and it also works well in some non-Gaussian situations. It is well known, however, that it has a…
Multilabel classification is a relatively recent subfield of machine learning. Unlike to the classical approach, where instances are labeled with only one category, in multilabel classification, an arbitrary number of categories is chosen…
We show that established performance metrics in binary classification, such as the F-score, the Jaccard similarity coefficient or Matthews' correlation coefficient (MCC), are not robust to class imbalance in the sense that if the proportion…