English
Related papers

Related papers: Humanly Certifying Superhuman Classifiers

200 papers

If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this…

Machine Learning · Computer Science 2023-10-20 Lukas Fluri , Daniel Paleka , Florian Tramèr

Supervised machine learning utilizes large datasets, often with ground truth labels annotated by humans. While some data points are easy to classify, others are hard to classify, which reduces the inter-annotator agreement. This causes…

Human-Computer Interaction · Computer Science 2023-02-14 Andrea Papenmeier , Dagmar Kern , Daniel Hienert , Yvonne Kammerer , Christin Seifert

Human-annotated labels and explanations are critical for training explainable NLP models. However, unlike human-annotated labels whose quality is easier to calibrate (e.g., with a majority vote), human-crafted free-form explanations can be…

Computation and Language · Computer Science 2023-05-23 Bingsheng Yao , Prithviraj Sen , Lucian Popa , James Hendler , Dakuo Wang

Humans are the final decision makers in critical tasks that involve ethical and legal concerns, ranging from recidivism prediction, to medical diagnosis, to fighting against fake news. Although machine learning models can sometimes achieve…

Artificial Intelligence · Computer Science 2019-01-10 Vivian Lai , Chenhao Tan

This work offers a novel view on the use of human input as labels, acknowledging that humans may err. We build a behavioral profile for human annotators which is used as a feature representation of the provided input. We show that by…

Databases · Computer Science 2022-05-09 Roee Shraga

While Artificial Intelligence has successfully outperformed humans in complex combinatorial games (such as chess and checkers), humans have retained their supremacy in social interactions that require intuition and adaptation, such as…

Computers and Society · Computer Science 2014-04-22 Fatimah Ishowo-Oloko , Jacob Crandall , Manuel Cebrian , Sherief Abdallah , Iyad Rahwan

In Machine Translation (MT) evaluation, metric performance is assessed based on agreement with human judgments. In recent years, automatic metrics have demonstrated increasingly high levels of agreement with humans. To gain a clearer…

Computation and Language · Computer Science 2025-06-25 Lorenzo Proietti , Stefano Perrella , Roberto Navigli

An increasing number of domains are providing us with detailed trace data on human decisions in settings where we can evaluate the quality of these decisions via an algorithm. Motivated by this development, an emerging line of work has…

Artificial Intelligence · Computer Science 2016-06-17 Ashton Anderson , Jon Kleinberg , Sendhil Mullainathan

Human feedback is critical for aligning AI systems to human values. As AI capabilities improve and AI is used to tackle more challenging tasks, verifying quality and safety becomes increasingly challenging. This paper explores how we can…

Artificial Intelligence · Computer Science 2025-10-31 Rishub Jain , Sophie Bridgers , Lili Janzer , Rory Greig , Tian Huey Teh , Vladimir Mikulik

Prior studies have shown that distinguishing text generated by Large Language Models (LLMs) from human-written one is highly challenging for humans, and often no better than random guessing. To verify the generalizability of this finding…

Much of machine learning research focuses on predictive accuracy: given a task, create a machine learning model (or algorithm) that maximizes accuracy. In many settings, however, the final prediction or decision of a system is under the…

Computers and Society · Computer Science 2022-06-02 Kate Donahue , Alexandra Chouldechova , Krishnaram Kenthapadi

Humans are routinely asked to evaluate the performance of other individuals, separating success from failure and affecting outcomes from science to education and sports. Yet, in many contexts, the metrics driving the human evaluation…

Physics and Society · Physics 2017-12-07 Luca Pappalardo , Paolo Cintia , Dino Pedreschi , Fosca Giannotti , Albert-Laszlo Barabasi

Despite growing interest in using large language models (LLMs) to automate annotation, their effectiveness in complex, nuanced, and multi-dimensional labelling tasks remains relatively underexplored. This study focuses on annotation for the…

Information Retrieval · Computer Science 2025-07-02 Leila Tavakoli , Hamed Zamani

Recent advancements in deep reinforcement learning have brought forth an impressive display of highly skilled artificial agents capable of complex intelligent behavior. In video games, these artificial agents are increasingly deployed as…

Machine Learning · Statistics 2022-03-14 Ian Colbert , Mehdi Saeedi

Work on "learning with rationales" shows that humans providing explanations to a machine learning system can improve the system's predictive accuracy. However, this work has not been connected to work in "explainable AI" which concerns…

Computation and Language · Computer Science 2019-06-03 Julia Strout , Ye Zhang , Raymond J. Mooney

Recent benchmark studies have claimed that AI has approached or even surpassed human-level performances on various cognitive tasks. However, this position paper argues that current AI evaluation paradigms are insufficient for assessing…

Supervised systems require human labels for training. But, are humans themselves always impartial during the annotation process? We examine this question in the context of automated assessment of human behavioral tasks. Specifically, we…

Human-Computer Interaction has been shown to lead to improvements in machine learning systems by boosting model performance, accelerating learning and building user confidence. In this work, we aim to alleviate the expectation that human…

Machine Learning · Computer Science 2024-03-29 Jonathan Erskine , Matt Clifford , Alexander Hepburn , Raúl Santos-Rodríguez

Classic evaluation methods of believable agents are time-consuming because they involve many human to judge agents. They are well suited to validate work on new believable behaviours models. However, during the implementation, numerous…

Artificial Intelligence · Computer Science 2010-09-03 Fabien Tencé , Cédric Buche

As machine learning and algorithmic decision making systems are increasingly being leveraged in high-stakes human-in-the-loop settings, there is a pressing need to understand the rationale of their predictions. Researchers have responded to…

Machine Learning · Computer Science 2020-12-07 Jonathan Dinu , Jeffrey Bigham , J. Zico Kolter
‹ Prev 1 2 3 10 Next ›