Related papers: Data Analysis for Proficiency Testing

Treatment of bimodality in proficiency test of pH in bioethanol matrix

The pH value in bioethanol is a quality control parameter related to its acidity and to the corrosiveness of vehicle engines when it is used as fuel. In order to verify the comparability and reliability of the measurement of pH in…

Applications · Statistics 2015-05-08 G. F. Sarmanho , P. P. Borges , I. C. S. Fraga , L. H. C Leal

Multivariate Comparison of Classification Algorithms

Statistical tests that compare classification algorithms are univariate and use a single performance measure, e.g., misclassification error, $F$ measure, AUC, and so on. In multivariate tests, comparison is done using multiple measures…

Machine Learning · Statistics 2014-09-17 Olcay Taner Yildiz , Ethem Alpaydin

Evaluation of student proficiency through a multidimensional finite mixture IRT model

In certain academic systems, a student can enroll for an exam immediately after the end of the teaching period or can postpone it to any later examination session, so that the grade is missing until the exam is not attempted. We propose an…

Methodology · Statistics 2016-09-22 Silvia Bacci , Francesco Bartolucci , Leonardo Grilli , Carla Rampichini

Efficient and powerful equivalency test on combined mean and variance with application to diagnostic device comparison studies

In medical device comparison studies, equivalency test is commonly used to demonstrate two measurement methods agree up to a pre-specified performance goal based on the paired repeated measures. Such equivalency test often involves…

Methodology · Statistics 2019-08-22 Yun Bai , Zengri Wang , Theodore Lystig , Baolin Wu

A Factor-Adjusted Multiple Testing Procedure with Application to Mutual Fund Selection

In this article, we propose a factor-adjusted multiple testing (FAT) procedure based on factor-adjusted p-values in a linear factor model involving some observable and unobservable factors, for the purpose of selecting skilled funds in…

Methodology · Statistics 2019-03-04 Wei Lan , Lilun Du

Predictive Multiplicity in Probabilistic Classification

Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given…

Machine Learning · Computer Science 2023-06-27 Jamelle Watson-Daniels , David C. Parkes , Berk Ustun

Using predictive multiplicity to measure individual performance within the AI Act

When building AI systems for decision support, one often encounters the phenomenon of predictive multiplicity: a single best model does not exist; instead, one can construct many models with similar overall accuracy that differ in their…

Machine Learning · Computer Science 2026-02-13 Karolin Frohnapfel , Mara Seyfert , Sebastian Bordt , Ulrike von Luxburg , Kristof Meding

Automated, Targeted Testing of Property-Based Testing Predicates

Context: This work is based on property-based testing (PBT). PBT is an increasingly important form of software testing. Furthermore, it serves as a concrete gateway into the abstract area of formal methods. Specifically, we focus on…

Programming Languages · Computer Science 2021-11-23 Tim Nelson , Elijah Rivera , Sam Soucie , Thomas Del Vecchio , John Wrenn , Shriram Krishnamurthi

More Powerful Multiple Testing in Randomized Experiments with Non-Compliance

Two common concerns raised in analyses of randomized experiments are (i) appropriately handling issues of non-compliance, and (ii) appropriately adjusting for multiple tests (e.g., on multiple outcomes or subgroups). Although simple…

Methodology · Statistics 2016-05-25 Joseph J. Lee , Laura Forastiere , Luke Miratrix , Natesh S. Pillai

Detection and Mitigation of Algorithmic Bias via Predictive Rate Parity

Predictive parity (PP), also known as sufficiency, is a core definition of algorithmic fairness essentially stating that model outputs must have the same interpretation of expected outcomes regardless of group. Testing and satisfying PP is…

Methodology · Statistics 2023-06-01 Cyrus DiCiccio , Brian Hsu , YinYin Yu , Preetam Nandy , Kinjal Basu

Implicit assessment of language learning during practice as accurate as explicit testing

Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions,…

Artificial Intelligence · Computer Science 2024-09-25 Jue Hou , Anisia Katinskaia , Anh-Duc Vu , Roman Yangarber

Statistical Inference for Covariate-Adjusted and Interpretable Generalized Factor Model with Application to Testing Fairness

Latent variable models are popularly used to measure latent factors (e.g., abilities and personalities) from large-scale assessment data. Beyond understanding these latent factors, the covariate effect on responses controlling for latent…

Methodology · Statistics 2026-01-12 Jing Ouyang , Chengyu Cui , Kean Ming Tan , Gongjun Xu

Set-based differential covariance testing for high-throughput data

The problem of detecting changes in covariance for a single pair of features has been studied in some detail, but may be limited in importance or general applicability. In contrast, testing equality of covariance matrices of a {\it set} of…

Methodology · Statistics 2017-12-12 Yi-Hui Zhou

Posterior Predictive P-values with Fisher Randomization Tests in Noncompliance Settings: Test Statistics vs Discrepancy Variables

In randomized experiments with noncompliance, tests may focus on compliers rather than on the overall sample. Rubin (1998) put forth such a method, and argued that testing for the complier average causal effect and averaging permutation…

Methodology · Statistics 2016-02-23 Laura Forastiere , Fabrizia Mealli , Luke Miratrix

Multiple Comparison Procedures for Simultaneous Inference in Functional MANOVA

Functional data analysis is becoming increasingly popular to study data from real-valued random functions. Nevertheless, there is a lack of multiple testing procedures for such data. These are particularly important in factorial designs to…

Methodology · Statistics 2024-06-04 Merle Munko , Marc Ditzhaus , Markus Pauly , Łukasz Smaga

Evaluating AI Vocational Skills Through Professional Testing

Using a novel professional certification survey, the study focuses on assessing the vocational skills of two highly cited AI models, GPT-3 and Turbo-GPT3.5. The approach emphasizes the importance of practical readiness over academic…

Machine Learning · Computer Science 2023-12-19 David Noever , Matt Ciolino

Testing the Consistency of Performance Scores Reported for Binary Classification Problems

Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and…

Machine Learning · Computer Science 2023-10-20 Attila Fazekas , György Kovács

On modeling nonhomogeneous Poisson process for stochastic simulation input analysis

A validated simulation model primarily requires performing an appropriate input analysis mainly by determining the behavior of real-world processes using probability distributions. In many practical cases, probability distributions of the…

Applications · Statistics 2014-09-01 Issac Shams , Saeede Ajorlou , Kai Yang

A Statistical Analysis for Per-Instance Evaluation of Stochastic Optimizers: Avoiding Unreliable Conclusions

A key trait of stochastic optimizers is that multiple runs of the same optimizer in attempting to solve the same problem can produce different results. As a result, their performance is evaluated over several repeats, or runs, on the…

Machine Learning · Computer Science 2026-05-18 Moslem Noori , Elisabetta Valiante , Thomas Van Vaerenbergh , Masoud Mohseni , Ignacio Rozada

A Practitioner's Guide to Multiple Testing Error Rates

It is quite common in modern research, for a researcher to test many hypotheses. The statistical (frequentist) hypothesis testing framework, does not scale with the number of hypotheses in the sense that naively performing many hypothesis…

Methodology · Statistics 2013-06-26 Jonathan Rosenblatt