Related papers: The Interpretable Dictionary in Sparse Coding
Artificial neural networks are often very complex and too deep for a human to understand. As a result, they are usually referred to as black boxes. For a lot of real-world problems, the underlying pattern itself is very complicated, such…
Interpretability benefits the theoretical understanding of representations. Existing word embeddings are generally dense representations. Hence, the meaning of latent dimensions is difficult to interpret. This makes word embeddings like a…
Despite strong empirical performance for image classification, deep neural networks are often regarded as ``black boxes'' and they are difficult to interpret. On the other hand, sparse convolutional models, which assume that a signal can be…
Sparse autoencoders (SAEs) extract human-interpretable features from deep neural networks by transforming their activations into a sparse, higher dimensional latent space, and then reconstructing the activations from these latents.…
Deep Neural Networks have often been called the black box because of the complex, deep architecture and non-transparency presented by the inner layers. There is a lack of trust to use Artificial Intelligence in critical and high-precision…
Transformer models have become state-of-the-art in decoding stimuli and behavior from neural activity, significantly advancing neuroscience research. Yet greater transparency in their decision-making processes would substantially enhance…
Deep neural networks are effective feature extractors but they are prohibitively large for deployment scenarios. Due to the huge number of parameters, interpretability of parameters in different layers is not straight-forward. This is why…
Computationally explicit hypotheses of brain function derived from machine learning (ML)-based models have recently revolutionized neuroscience. Despite the unprecedented ability of these artificial neural networks (ANNs) to capture…
Sparse autoencoders (SAEs) are used to decompose neural network activations into sparsely activating features, but many SAE features are only interpretable at high activation strengths. To address this issue we propose to use binary sparse…
Interpretable machine learning has demonstrated impressive performance while preserving explainability. In particular, neural additive models (NAM) offer the interpretability to the black-box deep learning and achieve state-of-the-art…
Artificial neural networks (ANNs) have been broadly utilized to analyze various data and solve different domain problems. However, neural networks (NNs) have been considered a black box operation for years because their underlying…
Understanding how information is represented in neural networks is a fundamental challenge in both neuroscience and artificial intelligence. Despite their nonlinear architectures, recent evidence suggests that neural networks encode…
Many current state-of-the-art models for sequential recommendations are based on transformer architectures. Interpretation and explanation of such black box models is an important research question, as a better understanding of their…
Neural Audio Codecs (NACs) are widely adopted in modern speech systems, yet how they encode linguistic and paralinguistic information remains unclear. Improving the interpretability of NAC representations is critical for understanding and…
Disentangling model activations into meaningful features is a central problem in interpretability. However, the absence of ground-truth for these features in realistic scenarios makes validating recent approaches, such as sparse dictionary…
This paper introduces an efficient and robust method for discovering interpretable circuits in large language models using discrete sparse autoencoders. Our approach addresses key limitations of existing techniques, namely computational…
Neural networks are often regarded as "black boxes" due to their complex functions and numerous parameters, which poses significant challenges for interpretability. This study addresses these challenges by introducing methods to enhance the…
Sparse coding networks, which utilize unsupervised learning to maximize coding efficiency, have successfully reproduced response properties found in primary visual cortex \cite{AN:OlshausenField96}. However, conventional sparse coding…
Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations…
Word embeddings are a powerful natural language processing technique, but they are extremely difficult to interpret. To enable interpretable NLP models, we create vectors where each dimension is inherently interpretable. By inherently…