English
Related papers

Related papers: Data Analysis for Proficiency Testing

200 papers

The pH value in bioethanol is a quality control parameter related to its acidity and to the corrosiveness of vehicle engines when it is used as fuel. In order to verify the comparability and reliability of the measurement of pH in…

Applications · Statistics 2015-05-08 G. F. Sarmanho , P. P. Borges , I. C. S. Fraga , L. H. C Leal

Statistical tests that compare classification algorithms are univariate and use a single performance measure, e.g., misclassification error, $F$ measure, AUC, and so on. In multivariate tests, comparison is done using multiple measures…

Machine Learning · Statistics 2014-09-17 Olcay Taner Yildiz , Ethem Alpaydin

In certain academic systems, a student can enroll for an exam immediately after the end of the teaching period or can postpone it to any later examination session, so that the grade is missing until the exam is not attempted. We propose an…

Methodology · Statistics 2016-09-22 Silvia Bacci , Francesco Bartolucci , Leonardo Grilli , Carla Rampichini

In medical device comparison studies, equivalency test is commonly used to demonstrate two measurement methods agree up to a pre-specified performance goal based on the paired repeated measures. Such equivalency test often involves…

Methodology · Statistics 2019-08-22 Yun Bai , Zengri Wang , Theodore Lystig , Baolin Wu

In this article, we propose a factor-adjusted multiple testing (FAT) procedure based on factor-adjusted p-values in a linear factor model involving some observable and unobservable factors, for the purpose of selecting skilled funds in…

Methodology · Statistics 2019-03-04 Wei Lan , Lilun Du

Machine learning models are often used to inform real world risk assessment tasks: predicting consumer default risk, predicting whether a person suffers from a serious illness, or predicting a person's risk to appear in court. Given…

Machine Learning · Computer Science 2023-06-27 Jamelle Watson-Daniels , David C. Parkes , Berk Ustun

When building AI systems for decision support, one often encounters the phenomenon of predictive multiplicity: a single best model does not exist; instead, one can construct many models with similar overall accuracy that differ in their…

Machine Learning · Computer Science 2026-02-13 Karolin Frohnapfel , Mara Seyfert , Sebastian Bordt , Ulrike von Luxburg , Kristof Meding

Context: This work is based on property-based testing (PBT). PBT is an increasingly important form of software testing. Furthermore, it serves as a concrete gateway into the abstract area of formal methods. Specifically, we focus on…

Programming Languages · Computer Science 2021-11-23 Tim Nelson , Elijah Rivera , Sam Soucie , Thomas Del Vecchio , John Wrenn , Shriram Krishnamurthi

Two common concerns raised in analyses of randomized experiments are (i) appropriately handling issues of non-compliance, and (ii) appropriately adjusting for multiple tests (e.g., on multiple outcomes or subgroups). Although simple…

Methodology · Statistics 2016-05-25 Joseph J. Lee , Laura Forastiere , Luke Miratrix , Natesh S. Pillai

Predictive parity (PP), also known as sufficiency, is a core definition of algorithmic fairness essentially stating that model outputs must have the same interpretation of expected outcomes regardless of group. Testing and satisfying PP is…

Methodology · Statistics 2023-06-01 Cyrus DiCiccio , Brian Hsu , YinYin Yu , Preetam Nandy , Kinjal Basu

Assessment of proficiency of the learner is an essential part of Intelligent Tutoring Systems (ITS). We use Item Response Theory (IRT) in computer-aided language learning for assessment of student ability in two contexts: in test sessions,…

Artificial Intelligence · Computer Science 2024-09-25 Jue Hou , Anisia Katinskaia , Anh-Duc Vu , Roman Yangarber

Latent variable models are popularly used to measure latent factors (e.g., abilities and personalities) from large-scale assessment data. Beyond understanding these latent factors, the covariate effect on responses controlling for latent…

Methodology · Statistics 2026-01-12 Jing Ouyang , Chengyu Cui , Kean Ming Tan , Gongjun Xu

The problem of detecting changes in covariance for a single pair of features has been studied in some detail, but may be limited in importance or general applicability. In contrast, testing equality of covariance matrices of a {\it set} of…

Methodology · Statistics 2017-12-12 Yi-Hui Zhou

In randomized experiments with noncompliance, tests may focus on compliers rather than on the overall sample. Rubin (1998) put forth such a method, and argued that testing for the complier average causal effect and averaging permutation…

Methodology · Statistics 2016-02-23 Laura Forastiere , Fabrizia Mealli , Luke Miratrix

Functional data analysis is becoming increasingly popular to study data from real-valued random functions. Nevertheless, there is a lack of multiple testing procedures for such data. These are particularly important in factorial designs to…

Methodology · Statistics 2024-06-04 Merle Munko , Marc Ditzhaus , Markus Pauly , Łukasz Smaga

Using a novel professional certification survey, the study focuses on assessing the vocational skills of two highly cited AI models, GPT-3 and Turbo-GPT3.5. The approach emphasizes the importance of practical readiness over academic…

Machine Learning · Computer Science 2023-12-19 David Noever , Matt Ciolino

Binary classification is a fundamental task in machine learning, with applications spanning various scientific domains. Whether scientists are conducting fundamental research or refining practical applications, they typically assess and…

Machine Learning · Computer Science 2023-10-20 Attila Fazekas , György Kovács

A validated simulation model primarily requires performing an appropriate input analysis mainly by determining the behavior of real-world processes using probability distributions. In many practical cases, probability distributions of the…

Applications · Statistics 2014-09-01 Issac Shams , Saeede Ajorlou , Kai Yang

A key trait of stochastic optimizers is that multiple runs of the same optimizer in attempting to solve the same problem can produce different results. As a result, their performance is evaluated over several repeats, or runs, on the…

Machine Learning · Computer Science 2026-05-18 Moslem Noori , Elisabetta Valiante , Thomas Van Vaerenbergh , Masoud Mohseni , Ignacio Rozada

It is quite common in modern research, for a researcher to test many hypotheses. The statistical (frequentist) hypothesis testing framework, does not scale with the number of hypotheses in the sense that naively performing many hypothesis…

Methodology · Statistics 2013-06-26 Jonathan Rosenblatt
‹ Prev 1 2 3 10 Next ›