English
Related papers

Related papers: The Interpretable Dictionary in Sparse Coding

200 papers

Artificial neural networks are often very complex and too deep for a human to understand. As a result, they are usually referred to as black boxes. For a lot of real-world problems, the underlying pattern itself is very complicated, such…

Machine Learning · Computer Science 2020-11-26 Yang Li

Interpretability benefits the theoretical understanding of representations. Existing word embeddings are generally dense representations. Hence, the meaning of latent dimensions is difficult to interpret. This makes word embeddings like a…

Computation and Language · Computer Science 2023-06-27 Minxue Xia , Hao Zhu

Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be…

Computer Vision and Pattern Recognition · Computer Science 2022-10-25 Xili Dai , Mingyang Li , Pengyuan Zhai , Shengbang Tong , Xingjian Gao , Shao-Lun Huang , Zhihui Zhu , Chong You , Yi Ma

Sparse autoencoders (SAEs) extract human-interpretable features from deep neural networks by transforming their activations into a sparse, higher dimensional latent space, and then reconstructing the activations from these latents.…

Machine Learning · Computer Science 2025-02-13 Gonçalo Paulo , Stepan Shabalin , Nora Belrose

Deep Neural Networks have often been called the black box because of the complex, deep architecture and non-transparency presented by the inner layers. There is a lack of trust to use Artificial Intelligence in critical and high-precision…

Computer Vision and Pattern Recognition · Computer Science 2024-03-12 Frincy Clement , Ji Yang , Irene Cheng

Transformer models have become state-of-the-art in decoding stimuli and behavior from neural activity, significantly advancing neuroscience research. Yet greater transparency in their decision-making processes would substantially enhance…

Quantitative Methods · Quantitative Biology 2025-06-18 Laurence Freeman , Philip Shamash , Vinam Arora , Caswell Barry , Tiago Branco , Eva Dyer

Deep neural networks are effective feature extractors but they are prohibitively large for deployment scenarios. Due to the huge number of parameters, interpretability of parameters in different layers is not straight-forward. This is why…

Computation and Language · Computer Science 2021-12-23 Saeed Damadi

Computationally explicit hypotheses of brain function derived from machine learning (ML)-based models have recently revolutionized neuroscience. Despite the unprecedented ability of these artificial neural networks (ANNs) to capture…

Neurons and Cognition · Quantitative Biology 2023-12-12 Kohitij Kar , Simon Kornblith , Evelina Fedorenko

Sparse autoencoders (SAEs) are used to decompose neural network activations into sparsely activating features, but many SAE features are only interpretable at high activation strengths. To address this issue we propose to use binary sparse…

Machine Learning · Computer Science 2025-10-01 Lucia Quirke , Stepan Shabalin , Nora Belrose

Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art…

Machine Learning · Statistics 2022-02-28 Shiyun Xu , Zhiqi Bu , Pratik Chaudhari , Ian J. Barnett

Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying…

Human-Computer Interaction · Computer Science 2023-10-04 Dong H. Jeong , Jin-Hee Cho , Feng Chen , Audun Josang , Soo-Yeon Ji

Understanding how information is represented in neural networks is a fundamental challenge in both neuroscience and artificial intelligence. Despite their nonlinear architectures, recent evidence suggests that neural networks encode…

Machine Learning · Computer Science 2025-03-04 David Klindt , Charles O'Neill , Patrik Reizinger , Harald Maurer , Nina Miolane

Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their…

Information Retrieval · Computer Science 2026-02-18 Anton Klenitskiy , Konstantin Polev , Daria Denisova , Alexey Vasilev , Dmitry Simakov , Gleb Gusev

Neural Audio Codecs (NACs) are widely adopted in modern speech systems, yet how they encode linguistic and paralinguistic information remains unclear. Improving the interpretability of NAC representations is critical for understanding and…

Disentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary…

Machine Learning · Computer Science 2024-05-21 Aleksandar Makelov , George Lange , Neel Nanda

This paper introduces an efficient and robust method for discovering interpretable circuits in large language models using discrete sparse autoencoders. Our approach addresses key limitations of existing techniques, namely computational…

Computation and Language · Computer Science 2024-05-22 Charles O'Neill , Thang Bui

Neural networks are often regarded as "black boxes" due to their complex functions and numerous parameters, which poses significant challenges for interpretability. This study addresses these challenges by introducing methods to enhance the…

Machine Learning · Statistics 2024-09-26 Mohamad Yamen AL Mohamad , Hossein Bevrani , Ali Akbar Haydari

Sparse coding networks, which utilize unsupervised learning to maximize coding efficiency, have successfully reproduced response properties found in primary visual cortex \cite{AN:OlshausenField96}. However, conventional sparse coding…

Neurons and Cognition · Quantitative Biology 2011-05-25 William K. Coulter , Christopher J. Hillar , Friedrich T. Sommer

Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations…

Computer Vision and Pattern Recognition · Computer Science 2020-04-29 Patrick Esser , Robin Rombach , Björn Ommer

Word embeddings are a powerful natural language processing technique, but they are extremely difficult to interpret. To enable interpretable NLP models, we create vectors where each dimension is inherently interpretable. By inherently…

Computation and Language · Computer Science 2021-09-29 Adly Templeton
‹ Prev 1 2 3 10 Next ›