Related papers: Evaluation Gaps in Machine Learning Practice

Evaluating Supervised Machine Learning Models: Principles, Pitfalls, and Metric Selection

The evaluation of supervised machine learning models is a critical stage in the development of reliable predictive systems. Despite the widespread availability of machine learning libraries and automated workflows, model assessment is often…

Machine Learning · Computer Science 2026-04-16 Xuanyan Liu , Ignacio Cabrera Martin , Marcello Trovati , Xiaolong Xu , Nikolaos Polatidis

Absolute Evaluation Measures for Machine Learning: A Survey

Machine Learning is a diverse field applied across various domains such as computer science, social sciences, medicine, chemistry, and finance. This diversity results in varied evaluation approaches, making it difficult to compare models…

Machine Learning · Computer Science 2025-07-08 Silvia Beddar-Wiesing , Alice Moallemy-Oureh , Marie Kempkes , Josephine M. Thomas

A Conceptual Framework for Ethical Evaluation of Machine Learning Systems

Research in Responsible AI has developed a range of principles and practices to ensure that machine learning systems are used in a manner that is ethical and aligned with human values. However, a critical yet often neglected aspect of…

Computers and Society · Computer Science 2024-08-21 Neha R. Gupta , Jessica Hullman , Hari Subramonyam

Impact of Missing Values in Machine Learning: A Comprehensive Analysis

Machine learning (ML) has become a ubiquitous tool across various domains of data mining and big data analysis. The efficacy of ML models depends heavily on high-quality datasets, which are often complicated by the presence of missing…

Machine Learning · Computer Science 2024-10-14 Abu Fuad Ahmad , Md Shohel Sayeed , Khaznah Alshammari , Istiaque Ahmed

On the Importance of Application-Grounded Experimental Design for Evaluating Explainable ML Methods

Most existing evaluations of explainable machine learning (ML) methods rely on simplifying assumptions or proxies that do not reflect real-world use cases; the handful of more robust evaluations on real-world settings have shortcomings in…

Machine Learning · Computer Science 2023-02-23 Kasun Amarasinghe , Kit T. Rodolfa , Sérgio Jesus , Valerie Chen , Vladimir Balayan , Pedro Saleiro , Pedro Bizarro , Ameet Talwalkar , Rayid Ghani

On Misbehaviour and Fault Tolerance in Machine Learning Systems

Machine learning (ML) provides us with numerous opportunities, allowing ML systems to adapt to new situations and contexts. At the same time, this adaptability raises uncertainties concerning the run-time product quality or dependability,…

Software Engineering · Computer Science 2022-10-18 Lalli Myllyaho , Mikko Raatikainen , Tomi Männistö , Jukka K. Nurminen , Tommi Mikkonen

Bad practices in evaluation methodology relevant to class-imbalanced problems

For research to go in the right direction, it is essential to be able to compare and quantify performance of different algorithms focused on the same problem. Choosing a suitable evaluation metric requires deep understanding of the pursued…

Machine Learning · Computer Science 2018-12-05 Jan Brabec , Lukas Machlica

On the Robustness of Fairness Practices: A Causal Framework for Systematic Evaluation

Machine learning (ML) algorithms are increasingly deployed to make critical decisions in socioeconomic applications such as finance, criminal justice, and autonomous driving. However, due to their data-driven and pattern-seeking nature, ML…

Software Engineering · Computer Science 2026-01-08 Verya Monjezi , Ashish Kumar , Ashutosh Trivedi , Gang Tan , Saeid Tizpaz-Niari

Explainable Machine Learning for Public Policy: Use Cases, Gaps, and Research Directions

Explainability is highly-desired in Machine Learning (ML) systems supporting high-stakes policy decisions in areas such as health, criminal justice, education, and employment. While the field of explainable ML has expanded in recent years,…

Machine Learning · Computer Science 2023-02-21 Kasun Amarasinghe , Kit Rodolfa , Hemank Lamba , Rayid Ghani

Assessing Perceived Fairness from Machine Learning Developer's Perspective

Fairness in machine learning (ML) applications is an important practice for developers in research and industry. In ML applications, unfairness is triggered due to bias in the data, curation process, erroneous assumptions, and implicit bias…

Machine Learning · Computer Science 2023-04-10 Anoop Mishra , Deepak Khazanchi

Monitoring and Observability of Machine Learning Systems: Current Practices and Gaps

Production machine learning (ML) systems fail silently -- not with crashes, but through wrong decisions. While observability is recognized as critical for ML operations, there is a lack empirical evidence of what practitioners actually…

Software Engineering · Computer Science 2025-10-29 Joran Leest , Ilias Gerostathopoulos , Patricia Lago , Claudia Raibulet

Beyond Next Word Prediction: Developing Comprehensive Evaluation Frameworks for measuring LLM performance on real world applications

While Large Language Models (LLMs) are fundamentally next-token prediction systems, their practical applications extend far beyond this basic function. From natural language processing and text generation to conversational assistants and…

Computation and Language · Computer Science 2025-03-10 Vishakha Agrawal , Archie Chaudhury , Shreya Agrawal

Improving the Validity and Practical Usefulness of AI/ML Evaluations Using an Estimands Framework

Commonly, AI or machine learning (ML) models are evaluated on benchmark datasets. This practice supports innovative methodological research, but benchmark performance can be poorly correlated with performance in real-world applications -- a…

Machine Learning · Computer Science 2024-06-18 Olivier Binette , Jerome P. Reiter

Perspective of Software Engineering Researchers on Machine Learning Practices Regarding Research, Review, and Education

Context: Machine Learning (ML) significantly impacts Software Engineering (SE), but studies mainly focus on practitioners, neglecting researchers. This overlooks practices and challenges in teaching, researching, or reviewing ML…

Software Engineering · Computer Science 2024-12-02 Anamaria Mojica-Hanke , David Nader Palacio , Denys Poshyvanyk , Mario Linares-Vásquez , Steffen Herbold

Rethinking Machine Learning Model Evaluation in Pathology

Machine Learning has been applied to pathology images in research and clinical practice with promising outcomes. However, standard ML models often lack the rigorous evaluation required for clinical decisions. Machine learning techniques for…

Image and Video Processing · Electrical Eng. & Systems 2022-04-19 Syed Ashar Javed , Dinkar Juyal , Zahil Shanis , Shreya Chakraborty , Harsha Pokkalla , Aaditya Prakash

Approaches to Improving the Accuracy of Machine Learning Models in Requirements Elicitation Techniques Selection

Selecting techniques is a crucial element of the business analysis approach planning in IT projects. Particular attention is paid to the choice of techniques for requirements elicitation. One of the promising methods for selecting…

Software Engineering · Computer Science 2023-08-22 Denys Gobov , Olga Solovei

Rethinking and Recomputing the Value of Machine Learning Models

In this paper, we argue that the prevailing approach to training and evaluating machine learning models often fails to consider their real-world application within organizational or societal contexts, where they are intended to create…

Machine Learning · Computer Science 2025-04-24 Burcu Sayin , Jie Yang , Xinyue Chen , Andrea Passerini , Fabio Casati

Insights into Performance Fitness and Error Metrics for Machine Learning

Machine learning (ML) is the field of training machines to achieve high level of cognition and perform human-like analysis. Since ML is a data-driven approach, it seemingly fits into our daily lives and operations as well as complex and…

Machine Learning · Computer Science 2021-11-25 M. Z. Naser , Amir Alavi

Contextual Fairness-Aware Practices in ML: A Cost-Effective Empirical Evaluation

As machine learning (ML) systems become central to critical decision-making, concerns over fairness and potential biases have increased. To address this, the software engineering (SE) field has introduced bias mitigation techniques aimed at…

Software Engineering · Computer Science 2025-03-21 Alessandra Parziale , Gianmario Voria , Giammaria Giordano , Gemma Catolino , Gregorio Robles , Fabio Palomba

Characterizing and Detecting Mismatch in Machine-Learning-Enabled Systems

Increasing availability of machine learning (ML) frameworks and tools, as well as their promise to improve solutions to data-driven decision problems, has resulted in popularity of using ML techniques in software systems. However,…

Software Engineering · Computer Science 2021-03-29 Grace A. Lewis , Stephany Bellomo , Ipek Ozkaya