Related papers: Concept Bottleneck Large Language Models

Crafting Large Language Models for Enhanced Interpretability

We introduce the Concept Bottleneck Large Language Model (CB-LLM), a pioneering approach to creating inherently interpretable Large Language Models (LLMs). Unlike traditional black-box LLMs that rely on post-hoc interpretation methods with…

Computation and Language · Computer Science 2024-07-08 Chung-En Sun , Tuomas Oikarinen , Tsui-Wei Weng

Bayesian Concept Bottleneck Models with LLM Priors

Concept Bottleneck Models (CBMs) have been proposed as a compromise between white-box and black-box models, aiming to achieve interpretability without sacrificing accuracy. The standard training procedure for CBMs is to predefine a…

Machine Learning · Computer Science 2025-12-05 Jean Feng , Avni Kothari , Luke Zier , Chandan Singh , Yan Shuo Tan

Linearly-Interpretable Concept Embedding Models for Text Analysis

Despite their success, Large-Language Models (LLMs) still face criticism due to their lack of interpretability. Traditional post-hoc interpretation methods, based on attention and gradient-based analysis, offer limited insights as they only…

Computation and Language · Computer Science 2025-07-17 Francesco De Santis , Philippe Bich , Gabriele Ciravegna , Pietro Barbiero , Danilo Giordano , Tania Cerquitelli

Language Guided Concept Bottleneck Models for Interpretable Continual Learning

Continual learning (CL) aims to enable learning systems to acquire new knowledge constantly without forgetting previously learned information. CL faces the challenge of mitigating catastrophic forgetting while maintaining interpretability…

Computer Vision and Pattern Recognition · Computer Science 2025-04-01 Lu Yu , Haoyu Han , Zhe Tao , Hantao Yao , Changsheng Xu

VLG-CBM: Training Concept Bottleneck Models with Vision-Language Guidance

Concept Bottleneck Models (CBMs) provide interpretable prediction by introducing an intermediate Concept Bottleneck Layer (CBL), which encodes human-understandable concepts to explain models' decision. Recent works proposed to utilize Large…

Computer Vision and Pattern Recognition · Computer Science 2025-01-17 Divyansh Srivastava , Ge Yan , Tsui-Wei Weng

Concept Bottleneck Language Models For protein design

We introduce Concept Bottleneck Protein Language Models (CB-pLM), a generative masked language model with a layer where each neuron corresponds to an interpretable concept. Our architecture offers three key benefits: i) Control: We can…

Machine Learning · Computer Science 2024-12-12 Aya Abdelsalam Ismail , Tuomas Oikarinen , Amy Wang , Julius Adebayo , Samuel Stanton , Taylor Joren , Joseph Kleinhenz , Allen Goodman , Héctor Corrada Bravo , Kyunghyun Cho , Nathan C. Frey

Concept Layers: Enhancing Interpretability and Intervenability via LLM Conceptualization

The opaque nature of Large Language Models (LLMs) has led to significant research efforts aimed at enhancing their interpretability, primarily through post-hoc methods. More recent in-hoc approaches, such as Concept Bottleneck Models…

Machine Learning · Computer Science 2025-02-20 Or Raphael Bidusa , Shaul Markovitch

Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck

Black-box deep neural networks excel in text classification, yet their application in high-stakes domains is hindered by their lack of interpretability. To address this, we propose Text Bottleneck Models (TBM), an intrinsically…

Computation and Language · Computer Science 2024-04-04 Josh Magnus Ludan , Qing Lyu , Yue Yang , Liam Dugan , Mark Yatskar , Chris Callison-Burch

Concept Bottleneck Models Without Predefined Concepts

There has been considerable recent interest in interpretable concept-based models such as Concept Bottleneck Models (CBMs), which first predict human-interpretable concepts and then map them to output classes. To reduce reliance on…

Machine Learning · Computer Science 2024-07-08 Simon Schrodi , Julian Schur , Max Argus , Thomas Brox

LogicCBMs: Logic-Enhanced Concept-Based Learning

Concept Bottleneck Models (CBMs) provide a basis for semantic abstractions within a neural network architecture. Such models have primarily been seen through the lens of interpretability so far, wherein they offer transparency by inferring…

Computer Vision and Pattern Recognition · Computer Science 2025-12-09 Deepika SN Vemuri , Gautham Bellamkonda , Aditya Pola , Vineeth N Balasubramanian

Towards Achieving Concept Completeness for Textual Concept Bottleneck Models

Textual Concept Bottleneck Models (TCBMs) are interpretable-by-design models for text classification that predict a set of salient concepts before making the final prediction. This paper proposes Complete Textual Concept Bottleneck Model…

Computation and Language · Computer Science 2025-05-29 Milan Bhan , Yann Choho , Pierre Moreau , Jean-Noel Vittaut , Nicolas Chesneau , Marie-Jeanne Lesot

Uncertainty-aware Language Guidance for Concept Bottleneck Models

Concept Bottleneck Models (CBMs) provide inherent interpretability by first mapping input samples to high-level semantic concepts, followed by a combination of these concepts for the final classification. However, the annotation of…

Machine Learning · Computer Science 2026-03-02 Yangyi Li , Mengdi Huai

Learning Concept Bottleneck Models from Mechanistic Explanations

Concept Bottleneck Models (CBMs) aim for ante-hoc interpretability by learning a bottleneck layer that predicts interpretable concepts before the decision. State-of-the-art approaches typically select which concepts to learn via human…

Machine Learning · Computer Science 2026-03-10 Antonio De Santis , Schrasing Tong , Marco Brambilla , Lalana Kagal

Towards Fine-Grained and Verifiable Concept Bottleneck Models

Concept Bottleneck Models (CBMs) offer interpretable alternatives to black-box predictors by introducing human-relatable concepts before the final output. However, existing CBMs struggle to verify whether predicted concepts correspond to…

Machine Learning · Computer Science 2026-05-15 Yingying Fang , Haijie Xu , Shuang Wu , Mariathasan Anish , Guang Yang

Towards Faithful Multimodal Concept Bottleneck Models

Concept Bottleneck Models (CBMs) are interpretable models that route predictions through a layer of human-interpretable concepts. While widely studied in vision and, more recently, in NLP, CBMs remain largely unexplored in multimodal…

Computer Vision and Pattern Recognition · Computer Science 2026-03-16 Pierre Moreau , Emeline Pineau Ferrand , Yann Choho , Benjamin Wong , Annabelle Blangero , Milan Bhan

Relational Concept Bottleneck Models

The design of interpretable deep learning models working in relational domains poses an open challenge: interpretable deep learning methods, such as Concept Bottleneck Models (CBMs), are not designed to solve relational problems, while…

Machine Learning · Computer Science 2024-10-28 Pietro Barbiero , Francesco Giannini , Gabriele Ciravegna , Michelangelo Diligenti , Giuseppe Marra

Label-Free Concept Bottleneck Models

Concept bottleneck models (CBM) are a popular way of creating more interpretable neural networks by having hidden layer neurons correspond to human-understandable concepts. However, existing CBMs and their variants have two crucial…

Machine Learning · Computer Science 2023-06-06 Tuomas Oikarinen , Subhro Das , Lam M. Nguyen , Tsui-Wei Weng

Chat-CBM: Towards Interactive Concept Bottleneck Models with Frozen Large Language Models

Concept Bottleneck Models (CBMs) provide inherent interpretability by first predicting a set of human-understandable concepts and then mapping them to labels through a simple classifier. While users can intervene in the concept space to…

Computer Vision and Pattern Recognition · Computer Science 2025-09-23 Hangzhou He , Lei Zhu , Kaiwen Li , Xinliang Zhang , Jiakui Hu , Ourui Fu , Zhengjian Yao , Yanye Lu

Hierarchical, Interpretable, Label-Free Concept Bottleneck Model

Concept Bottleneck Models (CBMs) introduce interpretability to black-box deep learning models by predicting labels through human-understandable concepts. However, unlike humans, who identify objects at different levels of abstraction using…

Computer Vision and Pattern Recognition · Computer Science 2026-04-06 Haodong Xie , Yujun Cai , Rahul Singh Maharjan , Yiwei Wang , Federico Tavella , Angelo Cangelosi

CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

The main challenges limiting the adoption of deep learning-based solutions in medical workflows are the availability of annotated data and the lack of interpretability of such systems. Concept Bottleneck Models (CBMs) tackle the latter by…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Cristiano Patrício , Isabel Rio-Torto , Jaime S. Cardoso , Luís F. Teixeira , João C. Neves