Related papers: Multicriteria interpretability driven Deep Learnin…

MonoNet: Towards Interpretable Models by Learning Monotonic Features

Being able to interpret, or explain, the predictions made by a machine learning model is of fundamental importance. This is especially true when there is interest in deploying data-driven models to make high-stakes decisions, e.g. in…

Machine Learning · Computer Science 2019-10-01 An-phi Nguyen , María Rodríguez Martínez

Explaining Language Models' Predictions with High-Impact Concepts

The emergence of large-scale pretrained language models has posed unprecedented challenges in deriving explanations of why the model has made some predictions. Stemmed from the compositional nature of languages, spurious correlations have…

Computation and Language · Computer Science 2023-05-04 Ruochen Zhao , Shafiq Joty , Yongjie Wang , Tan Wang

Interpretation of Time-Series Deep Models: A Survey

Deep learning models developed for time-series associated tasks have become more widely researched nowadays. However, due to the unintuitive nature of time-series data, the interpretability problem -- where we understand what is under the…

Machine Learning · Computer Science 2023-05-25 Ziqi Zhao , Yucheng Shi , Shushan Wu , Fan Yang , Wenzhan Song , Ninghao Liu

A Categorisation of Post-hoc Explanations for Predictive Models

The ubiquity of machine learning based predictive models in modern society naturally leads people to ask how trustworthy those models are? In predictive modeling, it is quite common to induce a trade-off between accuracy and…

Machine Learning · Computer Science 2019-04-05 John Mitros , Brian Mac Namee

A Framework to Learn with Interpretation

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive…

Machine Learning · Computer Science 2022-02-24 Jayneel Parekh , Pavlo Mozharovskyi , Florence d'Alché-Buc

Interpretable Artificial Intelligence through the Lens of Feature Interaction

Interpretation of deep learning models is a very challenging problem because of their large number of parameters, complex connections between nodes, and unintelligible feature representations. Despite this, many view interpretability as a…

Machine Learning · Computer Science 2021-03-05 Michael Tsang , James Enouen , Yan Liu

An Interpretable Loan Credit Evaluation Method Based on Rule Representation Learner

The interpretability of model has become one of the obstacles to its wide application in the high-stake fields. The usual way to obtain interpretability is to build a black-box first and then explain it using the post-hoc methods. However,…

Machine Learning · Computer Science 2023-04-04 Zihao Chen , Xiaomeng Wang , Yuanjiang Huang , Tao Jia

A Comprehensive Survey on Self-Interpretable Neural Networks

Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides…

Machine Learning · Computer Science 2025-11-21 Yang Ji , Ying Sun , Yuting Zhang , Zhigaoyuan Wang , Yuanxin Zhuang , Zheng Gong , Dazhong Shen , Chuan Qin , Hengshu Zhu , Hui Xiong

Interpretability Needs a New Paradigm

Interpretability is the study of explaining models in understandable terms to humans. At present, interpretability is divided into two paradigms: the intrinsic paradigm, which believes that only models designed to be explained can be…

Machine Learning · Computer Science 2024-11-14 Andreas Madsen , Himabindu Lakkaraju , Siva Reddy , Sarath Chandar

The Mythos of Model Interpretability

Supervised machine learning models boast remarkable predictive capabilities. But can you trust your model? Will it work in deployment? What else can it tell you about the world? We want models to be not only good, but interpretable. And yet…

Machine Learning · Computer Science 2017-03-07 Zachary C. Lipton

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction…

Machine Learning · Computer Science 2022-07-18 Xuhong Li , Haoyi Xiong , Xingjian Li , Xuanyu Wu , Xiao Zhang , Ji Liu , Jiang Bian , Dejing Dou

A Theory of Diagnostic Interpretation in Supervised Classification

Interpretable deep learning is a fundamental building block towards safer AI, especially when the deployment possibilities of deep learning-based computer-aided medical diagnostic systems are so eminent. However, without a computational…

Machine Learning · Computer Science 2018-06-27 Anirban Mukhopadhyay

Interpreting Black Box Models via Hypothesis Testing

In science and medicine, model interpretations may be reported as discoveries of natural phenomena or used to guide patient treatments. In such high-stakes tasks, false discoveries may lead investigators astray. These applications would…

Machine Learning · Statistics 2020-08-18 Collin Burns , Jesse Thomason , Wesley Tansey

A Survey on Interpretable Reinforcement Learning

Although deep reinforcement learning has become a promising machine learning approach for sequential decision-making problems, it is still not mature enough for high-stake domains such as autonomous driving or medical applications. In such…

Machine Learning · Computer Science 2022-02-25 Claire Glanois , Paul Weng , Matthieu Zimmer , Dong Li , Tianpei Yang , Jianye Hao , Wulong Liu

Quantifying Model Complexity via Functional Decomposition for Better Post-Hoc Interpretability

Post-hoc model-agnostic interpretation methods such as partial dependence plots can be employed to interpret complex machine learning models. While these interpretation methods can be applied regardless of model complexity, they can produce…

Machine Learning · Statistics 2022-01-24 Christoph Molnar , Giuseppe Casalicchio , Bernd Bischl

Rigorous Interpretation Is a Form of Evaluation

Current machine learning models are evaluated through behavioral snapshots, with benchmark accuracies, win rates and outcome-based metrics. Model explanations and evaluations, however, are fundamentally intertwined: understanding why a…

Computers and Society · Computer Science 2026-05-08 Isabelle Lee , Emmy Liu , Cathy Jiao , Brihi Joshi , Dani Yogatama , Fazl Barez , Michael Saxon

Post-hoc Interpretability for Neural NLP: A Survey

Neural networks for NLP are becoming increasingly complex and widespread, and there is a growing concern if these models are responsible to use. Explaining models helps to address the safety and ethical concerns and is essential for…

Computation and Language · Computer Science 2023-11-29 Andreas Madsen , Siva Reddy , Sarath Chandar

Evaluation of post-hoc interpretability methods in time-series classification

Post-hoc interpretability methods are critical tools to explain neural-network results. Several post-hoc methods have emerged in recent years, but when applied to a given task, they produce different results, raising the question of which…

Machine Learning · Computer Science 2024-12-09 Hugues Turbé , Mina Bjelogrlic , Christian Lovis , Gianmarco Mengaldo

Uncovering Unique Concept Vectors through Latent Space Decomposition

Interpreting the inner workings of deep learning models is crucial for establishing trust and ensuring model safety. Concept-based explanations have emerged as a superior approach that is more interpretable than feature attribution…

Machine Learning · Computer Science 2023-07-17 Mara Graziani , Laura O' Mahony , An-Phi Nguyen , Henning Müller , Vincent Andrearczyk

An interpretable neural network model through piecewise linear approximation

Most existing interpretable methods explain a black-box model in a post-hoc manner, which uses simpler models or data analysis techniques to interpret the predictions after the model is learned. However, they (a) may derive contradictory…

Machine Learning · Computer Science 2020-01-22 Mengzhuo Guo , Qingpeng Zhang , Xiuwu Liao , Daniel Dajun Zeng