Related papers: Concept Gradient: Concept-based Interpretation Wit…

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)

The interpretation of deep learning models is a challenge due to their size, complexity, and often opaque internal state. In addition, many systems, such as image classifiers, operate on low-level features rather than high-level concepts.…

Machine Learning · Statistics 2019-04-05 Been Kim , Martin Wattenberg , Justin Gilmer , Carrie Cai , James Wexler , Fernanda Viegas , Rory Sayres

Navigating Neural Space: Revisiting Concept Activation Vectors to Overcome Directional Divergence

With a growing interest in understanding neural network prediction strategies, Concept Activation Vectors (CAVs) have emerged as a popular tool for modeling human-understandable concepts in the latent space. Commonly, CAVs are computed by…

Computer Vision and Pattern Recognition · Computer Science 2025-05-08 Frederik Pahde , Maximilian Dreyer , Leander Weber , Moritz Weckbecker , Christopher J. Anders , Thomas Wiegand , Wojciech Samek , Sebastian Lapuschkin

GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability

Concept Activation Vectors (CAVs) provide a powerful approach for interpreting deep neural networks by quantifying their sensitivity to human-defined concepts. However, when computed independently at different layers, CAVs often exhibit…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Zhenghao He , Sanchit Sinha , Guangzhi Xiong , Aidong Zhang

Robust Semantic Interpretability: Revisiting Concept Activation Vectors

Interpretability methods for image classification assess model trustworthiness by attempting to expose whether the model is systematically biased or attending to the same cues as a human would. Saliency methods for feature attribution…

Machine Learning · Statistics 2021-04-08 Jacob Pfau , Albert T. Young , Jerome Wei , Maria L. Wei , Michael J. Keiser

Exploiting Text-Image Latent Spaces for the Description of Visual Concepts

Concept Activation Vectors (CAVs) offer insights into neural network decision-making by linking human friendly concepts to the model's internal feature extraction process. However, when a new set of CAVs is discovered, they must still be…

Computer Vision and Pattern Recognition · Computer Science 2024-10-24 Laines Schmalwasser , Jakob Gawlikowski , Joachim Denzler , Julia Niebling

Concept activation vectors: a unifying view and adversarial attacks

Concept Activation Vectors (CAVs) are a tool from explainable AI, offering a promising approach for understanding how human-understandable concepts are encoded in a model's latent spaces. They are computed from hidden-layer activations of…

Machine Learning · Statistics 2026-01-28 Ekkehard Schnoor , Malik Tiomoko , Jawher Said , Alex Jung , Wojciech Samek

FastCAV: Efficient Computation of Concept Activation Vectors for Explaining Deep Neural Networks

Concepts such as objects, patterns, and shapes are how humans understand the world. Building on this intuition, concept-based explainability methods aim to study representations learned by deep neural networks in relation to…

Machine Learning · Computer Science 2025-05-26 Laines Schmalwasser , Niklas Penzel , Joachim Denzler , Julia Niebling

LG-CAV: Train Any Concept Activation Vector with Language Guidance

Concept activation vector (CAV) has attracted broad research interest in explainable AI, by elegantly attributing model predictions to specific concepts. However, the training of CAV often necessitates a large number of high-quality images,…

Computer Vision and Pattern Recognition · Computer Science 2024-10-15 Qihan Huang , Jie Song , Mengqi Xue , Haofei Zhang , Bingde Hu , Huiqiong Wang , Hao Jiang , Xingen Wang , Mingli Song

Explaining Explainability: Recommendations for Effective Use of Concept Activation Vectors

Concept-based explanations translate the internal representations of deep learning models into a language that humans are familiar with: concepts. One popular method for finding concepts is Concept Activation Vectors (CAVs), which are…

Machine Learning · Computer Science 2025-02-14 Angus Nicolson , Lisa Schut , J. Alison Noble , Yarin Gal

Human-Centered Concept Explanations for Neural Networks

Understanding complex machine learning models such as deep neural networks with explanations is crucial in various applications. Many explanations stem from the model perspective, and may not necessarily effectively communicate why the…

Machine Learning · Computer Science 2022-02-28 Chih-Kuan Yeh , Been Kim , Pradeep Ravikumar

Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors

Convolutional neural network (CNN) models for computer vision are powerful but lack explainability in their most basic form. This deficiency remains a key challenge when applying CNNs in important domains. Recent work on explanations…

Computer Vision and Pattern Recognition · Computer Science 2021-06-18 Ruihan Zhang , Prashan Madumal , Tim Miller , Krista A. Ehinger , Benjamin I. P. Rubinstein

Concept Distillation: Leveraging Human-Centered Explanations for Model Improvement

Humans use abstract concepts for understanding instead of hard features. Recent interpretability research has focused on human-centered concept explanations of neural networks. Concept Activation Vectors (CAVs) estimate a model's…

Machine Learning · Computer Science 2023-11-28 Avani Gupta , Saurabh Saini , P J Narayanan

On The Variability of Concept Activation Vectors

One of the most pressing challenges in artificial intelligence is to make models more transparent to their users. Recently, explainable artificial intelligence has come up with numerous method to tackle this challenge. A promising avenue is…

Machine Learning · Computer Science 2025-09-30 Julia Wenkmann , Damien Garreau

Exploring Concept Contribution Spatially: Hidden Layer Interpretation with Spatial Activation Concept Vector

To interpret deep learning models, one mainstream is to explore the learned concepts by networks. Testing with Concept Activation Vector (TCAV) presents a powerful tool to quantify the contribution of query concepts (represented by…

Computer Vision and Pattern Recognition · Computer Science 2022-05-25 Andong Wang , Wei-Ning Lee

Interpretability for Multimodal Emotion Recognition using Concept Activation Vectors

Multimodal Emotion Recognition refers to the classification of input video sequences into emotion labels based on multiple input modalities (usually video, audio and text). In recent years, Deep Neural networks have shown remarkable…

Machine Learning · Computer Science 2024-10-28 Ashish Ramayee Asokan , Nidarshan Kumar , Anirudh Venkata Ragam , Shylaja S Sharath

Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

Convolutional Neural Networks (CNNs) have seen significant performance improvements in recent years. However, due to their size and complexity, they function as black-boxes, leading to transparency concerns. State-of-the-art saliency…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Antonio De Santis , Riccardo Campi , Matteo Bianchi , Marco Brambilla

TextCAVs: Debugging vision models using text

Concept-based interpretability methods are a popular form of explanation for deep learning models which provide explanations in the form of high-level human interpretable concepts. These methods typically find concept activation vectors…

Machine Learning · Computer Science 2024-08-19 Angus Nicolson , Yarin Gal , J. Alison Noble

Concept-based Explanations using Non-negative Concept Activation Vectors and Decision Tree for CNN Models

This paper evaluates whether training a decision tree based on concepts extracted from a concept-based explainer can increase interpretability for Convolutional Neural Networks (CNNs) models and boost the fidelity and performance of the…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Gayda Mutahar , Tim Miller

Learning Interpretable Concept-Based Models with Human Feedback

Machine learning models that first learn a representation of a domain in terms of human-understandable concepts, then use it to make predictions, have been proposed to facilitate interpretation and interaction with models trained on…

Machine Learning · Computer Science 2020-12-08 Isaac Lage , Finale Doshi-Velez

CAT: Interpretable Concept-based Taylor Additive Models

As an emerging interpretable technique, Generalized Additive Models (GAMs) adopt neural networks to individually learn non-linear functions for each feature, which are then combined through a linear model for final predictions. Although…

Machine Learning · Computer Science 2024-08-01 Viet Duong , Qiong Wu , Zhengyi Zhou , Hongjue Zhao , Chenxiang Luo , Eric Zavesky , Huaxiu Yao , Huajie Shao