Related papers: Testing Framework for Black-box AI Models

Automated Testing of AI Models

The last decade has seen tremendous progress in AI technology and applications. With such widespread adoption, ensuring the reliability of the AI models is crucial. In past, we took the first step of creating a testing framework called…

Artificial Intelligence · Computer Science 2021-10-08 Swagatam Haldar , Deepak Vijaykeerthy , Diptikalyan Saha

Outline of an Independent Systematic Blackbox Test for ML-based Systems

This article proposes a test procedure that can be used to test ML models and ML-based systems independently of the actual training process. In this way, the typical quality statements such as accuracy and precision of these models and…

Machine Learning · Computer Science 2024-06-21 Hans-Werner Wiesbrock , Jürgen Großmann

A Conceptual Framework for AI Capability Evaluations

As AI systems advance and integrate into society, well-designed and transparent evaluations are becoming essential tools in AI governance, informing decisions by providing evidence about system capabilities and risks. Yet there remains a…

Artificial Intelligence · Computer Science 2025-06-24 María Victoria Carro , Denise Alejandra Mester , Francisca Gauna Selasco , Luca Nicolás Forziati Gangi , Matheo Sandleris Musa , Lola Ramos Pereyra , Mario Leiva , Juan Gustavo Corvalan , María Vanina Martinez , Gerardo Simari

Model Learning: A Survey on Foundation, Tools and Applications

The quality and correct functioning of software components embedded in electronic systems are of utmost concern especially for safety and mission-critical systems. Model-based testing and formal verification techniques can be employed to…

Formal Languages and Automata Theory · Computer Science 2019-01-08 Shahbaz Ali , Hailong Sun , Yongwang Zhao

Evaluation Framework for AI Systems in "the Wild"

Generative AI (GenAI) models have become vital across industries, yet current evaluation methods have not adapted to their widespread use. Traditional evaluations often rely on benchmarks and fixed datasets, frequently failing to reflect…

Computation and Language · Computer Science 2025-04-29 Sarah Jabbour , Trenton Chang , Anindya Das Antar , Joseph Peper , Insu Jang , Jiachen Liu , Jae-Won Chung , Shiqi He , Michael Wellman , Bryan Goodman , Elizabeth Bondi-Kelly , Kevin Samy , Rada Mihalcea , Mosharaf Chowdhury , David Jurgens , Lu Wang

AI-Compass: A Comprehensive and Effective Multi-module Testing Tool for AI Systems

AI systems, in particular with deep learning techniques, have demonstrated superior performance for various real-world applications. Given the need for tailored optimization in specific scenarios, as well as the concerns related to the…

Artificial Intelligence · Computer Science 2024-11-12 Zhiyu Zhu , Zhibo Jin , Hongsheng Hu , Minhui Xue , Ruoxi Sun , Seyit Camtepe , Praveen Gauravaram , Huaming Chen

Black-Box Verification for GUI Applications

In black-box testing of GUI applications (a form of system testing), a dynamic analysis of the GUI application is used to infer a black-box model; the black-box model is then used to derive test cases for the test of the GUI application. In…

Software Engineering · Computer Science 2012-10-18 Stephan Arlt , Evren Ermis , Sergio Feo-Arenis , Andreas Podelski

Combining closed-loop test generation and execution by means of model checking

Model checking is an established technique to formally verify automation systems which are required to be trusted. However, for sufficiently complex systems model checking becomes computationally infeasible. On the other hand, testing,…

Software Engineering · Computer Science 2019-07-30 Igor Buzhinsky , Valeriy Vyatkin

The Framework That Survives Bad Models: Human-AI Collaboration For Clinical Trials

Artificial intelligence (AI) holds great promise for supporting clinical trials, from patient recruitment and endpoint assessment to treatment response prediction. However, deploying AI without safeguards poses significant risks,…

Machine Learning · Computer Science 2025-10-09 Yao Chen , David Ohlssen , Aimee Readie , Gregory Ligozio , Ruvie Martin , Thibaud Coroller

SETA: Statistical Fault Attribution for Compound AI Systems

Modern AI systems increasingly comprise multiple interconnected neural networks to tackle complex inference tasks. Testing such systems for robustness and safety entails significant challenges. Current state-of-the-art robustness testing…

Artificial Intelligence · Computer Science 2026-01-28 Sayak Chowdhury , Meenakshi D'Souza

AI Evaluation: past, present and future

Artificial intelligence develops techniques and systems whose performance must be evaluated on a regular basis in order to certify and foster progress in the discipline. We will describe and critically assess the different ways AI systems…

Artificial Intelligence · Computer Science 2016-08-23 Jose Hernandez-Orallo

Breaking Barriers in Software Testing: The Power of AI-Driven Automation

Software testing remains critical for ensuring reliability, yet traditional approaches are slow, costly, and prone to gaps in coverage. This paper presents an AI-driven framework that automates test case generation and validation using…

Software Engineering · Computer Science 2025-08-25 Saba Naqvi , Mohammad Baqar

Measurement to Meaning: A Validity-Centered Framework for AI Evaluation

While the capabilities and utility of AI systems have advanced, rigorous norms for evaluating these systems have lagged. Grand claims, such as models achieving general reasoning capabilities, are supported with model performance on narrow…

Computers and Society · Computer Science 2025-06-27 Olawale Salaudeen , Anka Reuel , Ahmed Ahmed , Suhana Bedi , Zachary Robertson , Sudharsan Sundar , Ben Domingue , Angelina Wang , Sanmi Koyejo

Test and Evaluation Framework for Multi-Agent Systems of Autonomous Intelligent Agents

Test and evaluation is a necessary process for ensuring that engineered systems perform as intended under a variety of conditions, both expected and unexpected. In this work, we consider the unique challenges of developing a unifying test…

Systems and Control · Electrical Eng. & Systems 2022-01-21 Erin Lanus , Ivan Hernandez , Adam Dachowicz , Laura Freeman , Melanie Grande , Andrew Lang , Jitesh H. Panchal , Anthony Patrick , Scott Welch

Fairness in the Eyes of the Data: Certifying Machine-Learning Models

We present a framework that allows to certify the fairness degree of a model based on an interactive and privacy-preserving test. The framework verifies any trained model, regardless of its training process and architecture. Thus, it allows…

Artificial Intelligence · Computer Science 2021-06-28 Shahar Segal , Yossi Adi , Benny Pinkas , Carsten Baum , Chaya Ganesh , Joseph Keshet

Black-Box Edge AI Model Selection with Conformal Latency and Accuracy Guarantees

Edge artificial intelligence (AI) will be a central part of 6G, with powerful edge servers supporting devices in performing machine learning (ML) inference. However, it is challenging to deliver the latency and accuracy guarantees required…

Information Theory · Computer Science 2025-06-16 Anders E. Kalør , Tomoaki Ohtsuki

Uncertainty-Driven Black-Box Test Data Generation

We can never be certain that a software system is correct simply by testing it, but with every additional successful test we become less uncertain about its correctness. In absence of source code or elaborate specifications and models,…

Software Engineering · Computer Science 2016-08-11 Neil Walkinshaw , Gordon Fraser

Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models

As AI models scale to billions of parameters and operate with increasing autonomy, ensuring their safe, reliable operation demands engineering-grade security and assurance frameworks. This paper presents an enterprise-level, risk-aware,…

Cryptography and Security · Computer Science 2025-05-13 Krti Tallam

Data Synthesis for Testing Black-Box Machine Learning Models

The increasing usage of machine learning models raises the question of the reliability of these models. The current practice of testing with limited data is often insufficient. In this paper, we provide a framework for automated test data…

Machine Learning · Computer Science 2021-11-04 Diptikalyan Saha , Aniya Aggarwal , Sandeep Hans

Test Design and Review Argumentation in AI-Assisted Test Generation

AI assistants can increasingly generate and evolve test cases. The challenge is no longer merely to produce them, but also to help engineers understand why a generated artefact exists and what supports it. Existing work has focused on…

Software Engineering · Computer Science 2026-04-27 Eduard Paul Enoiu , Robert Feldt