Related papers: The Interpretable Dictionary in Sparse Coding

Interpretable Models in ANNs

Artificial neural networks are often very complex and too deep for a human to understand. As a result, they are usually referred to as black boxes. For a lot of real-world problems, the underlying pattern itself is very complicated, such…

Machine Learning · Computer Science 2020-11-26 Yang Li

Interpretable Neural Embeddings with Sparse Self-Representation

Interpretability benefits the theoretical understanding of representations. Existing word embeddings are generally dense representations. Hence, the meaning of latent dimensions is difficult to interpret. This makes word embeddings like a…

Computation and Language · Computer Science 2023-06-27 Minxue Xia , Hao Zhu

Revisiting Sparse Convolutional Model for Visual Recognition

Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Xili Dai , Mingyang Li , Pengyuan Zhai , Shengbang Tong , Xingjian Gao , Shao-Lun Huang , Zhihui Zhu , Chong You , Yi Ma

Transcoders Beat Sparse Autoencoders for Interpretability

Sparse autoencoders (SAEs) extract human-interpretable features from deep neural networks by transforming their activations into a sparse, higher dimensional latent space, and then reconstructing the activations from these latents.…

Machine Learning · Computer Science 2025-02-13 Gonçalo Paulo , Stepan Shabalin , Nora Belrose

Feature CAM: Interpretable AI in Image Classification

Deep Neural Networks have often been called the black box because of the complex, deep architecture and non-transparency presented by the inner layers. There is a lack of trust to use Artificial Intelligence in critical and high-precision…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Frincy Clement , Ji Yang , Irene Cheng

Beyond Black Boxes: Enhancing Interpretability of Transformers Trained on Neural Data

Transformer models have become state-of-the-art in decoding stimuli and behavior from neural activity, significantly advancing neuroscience research. Yet greater transparency in their decision-making processes would substantially enhance…

Quantitative Methods · Quantitative Biology 2025-06-18 Laurence Freeman , Philip Shamash , Vinam Arora , Caswell Barry , Tiago Branco , Eva Dyer

On the Compression of Natural Language Models

Deep neural networks are effective feature extractors but they are prohibitively large for deployment scenarios. Due to the huge number of parameters, interpretability of parameters in different layers is not straight-forward. This is why…

Computation and Language · Computer Science 2021-12-23 Saeed Damadi

Interpretability of artificial neural network models in artificial Intelligence vs. neuroscience

Computationally explicit hypotheses of brain function derived from machine learning (ML)-based models have recently revolutionized neuroscience. Despite the unprecedented ability of these artificial neural networks (ANNs) to capture…

Neurons and Cognition · Quantitative Biology 2023-12-12 Kohitij Kar , Simon Kornblith , Evelina Fedorenko

Binary Sparse Coding for Interpretability

Sparse autoencoders (SAEs) are used to decompose neural network activations into sparsely activating features, but many SAE features are only interpretable at high activation strengths. To address this issue we propose to use binary sparse…

Machine Learning · Computer Science 2025-10-01 Lucia Quirke , Stepan Shabalin , Nora Belrose

Sparse Neural Additive Model: Interpretable Deep Learning with Feature Selection via Group Sparsity

Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art…

Machine Learning · Statistics 2022-02-28 Shiyun Xu , Zhiqi Bu , Pratik Chaudhari , Ian J. Barnett

Active Learning on Neural Networks through Interactive Generation of Digit Patterns and Visual Representation

Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying…

Human-Computer Interaction · Computer Science 2023-10-04 Dong H. Jeong , Jin-Hee Cho , Feng Chen , Audun Josang , Soo-Yeon Ji

From superposition to sparse codes: interpretable representations in neural networks

Understanding how information is represented in neural networks is a fundamental challenge in both neuroscience and artificial intelligence. Despite their nonlinear architectures, recent evidence suggests that neural networks encode…

Machine Learning · Computer Science 2025-03-04 David Klindt , Charles O'Neill , Patrik Reizinger , Harald Maurer , Nina Miolane

Sparse Autoencoders for Sequential Recommendation Models: Interpretation and Flexible Control

Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their…

Information Retrieval · Computer Science 2026-02-18 Anton Klenitskiy , Konstantin Polev , Daria Denisova , Alexey Vasilev , Dmitry Simakov , Gleb Gusev

Towards Interpretable Framework for Neural Audio Codecs via Sparse Autoencoders: A Case Study on Accent Information

Neural Audio Codecs (NACs) are widely adopted in modern speech systems, yet how they encode linguistic and paralinguistic information remains unclear. Improving the interpretability of NAC representations is critical for understanding and…

Sound · Computer Science 2026-03-20 Shih-Heng Wang , Tiantian Feng , Aditya Kommineni , Thanathai Lertpetchpun , Bowen Yi , Xuan Shi , Shrikanth Narayanan

Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control

Disentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary…

Machine Learning · Computer Science 2024-05-21 Aleksandar Makelov , George Lange , Neel Nanda

Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models

This paper introduces an efficient and robust method for discovering interpretable circuits in large language models using discrete sparse autoencoders. Our approach addresses key limitations of existing techniques, namely computational…

Computation and Language · Computer Science 2024-05-22 Charles O'Neill , Thang Bui

Statistical tuning of artificial neural network

Neural networks are often regarded as "black boxes" due to their complex functions and numerous parameters, which poses significant challenges for interpretability. This study addresses these challenges by introducing methods to enhance the…

Machine Learning · Statistics 2024-09-26 Mohamad Yamen AL Mohamad , Hossein Bevrani , Ali Akbar Haydari

Adaptive compressed sensing - a new class of self-organizing coding models for neuroscience

Sparse coding networks, which utilize unsupervised learning to maximize coding efficiency, have successfully reproduced response properties found in primary visual cortex \cite{AN:OlshausenField96}. However, conventional sparse coding…

Neurons and Cognition · Quantitative Biology 2011-05-25 William K. Coulter , Christopher J. Hillar , Friedrich T. Sommer

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations…

Computer Vision and Pattern Recognition · Computer Science 2020-04-29 Patrick Esser , Robin Rombach , Björn Ommer

Word Equations: Inherently Interpretable Sparse Word Embeddingsthrough Sparse Coding

Word embeddings are a powerful natural language processing technique, but they are extremely difficult to interpret. To enable interpretable NLP models, we create vectors where each dimension is inherently interpretable. By inherently…

Computation and Language · Computer Science 2021-09-29 Adly Templeton