Related papers: Generating Interpretable Networks using Hypernetwo…

Interpretable Neural Network Decoupling

The remarkable performance of convolutional neural networks (CNNs) is entangled with their huge number of uninterpretable parameters, which has become the bottleneck limiting the exploitation of their full potential. Towards network…

Computer Vision and Pattern Recognition · Computer Science 2020-08-26 Yuchao Li , Rongrong Ji , Shaohui Lin , Baochang Zhang , Chenqian Yan , Yongjian Wu , Feiyue Huang , Ling Shao

Interpretable Deep Learning: Interpretation, Interpretability, Trustworthiness, and Beyond

Deep neural networks have been well-known for their superb handling of various machine learning and artificial intelligence tasks. However, due to their over-parameterized black-box nature, it is often difficult to understand the prediction…

Machine Learning · Computer Science 2022-07-18 Xuhong Li , Haoyi Xiong , Xingjian Li , Xuanyu Wu , Xiao Zhang , Ji Liu , Jiang Bian , Dejing Dou

A Brief Review of Hypernetworks in Deep Learning

Hypernetworks, or hypernets for short, are neural networks that generate weights for another neural network, known as the target network. They have emerged as a powerful deep learning technique that allows for greater flexibility,…

Machine Learning · Computer Science 2025-01-03 Vinod Kumar Chauhan , Jiandong Zhou , Ping Lu , Soheila Molaei , David A. Clifton

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations…

Computer Vision and Pattern Recognition · Computer Science 2020-04-29 Patrick Esser , Robin Rombach , Björn Ommer

Learning Interpretable Differentiable Logic Networks

The ubiquity of neural networks (NNs) in real-world applications, from healthcare to natural language processing, underscores their immense utility in capturing complex relationships within high-dimensional data. However, NNs come with…

Machine Learning · Computer Science 2024-07-08 Chang Yue , Niraj K. Jha

From Neurons to Neutrons: A Case Study in Interpretability

Mechanistic Interpretability (MI) promises a path toward fully understanding how neural networks make their predictions. Prior work demonstrates that even when trained to perform simple arithmetic, models can implement a variety of…

Machine Learning · Computer Science 2024-05-28 Ouail Kitouni , Niklas Nolte , Víctor Samuel Pérez-Díaz , Sokratis Trifinopoulos , Mike Williams

HyperNetworks

This work explores hypernetworks: an approach of using a one network, also known as a hypernetwork, to generate the weights for another network. Hypernetworks provide an abstraction that is similar to what is found in nature: the…

Machine Learning · Computer Science 2016-12-02 David Ha , Andrew Dai , Quoc V. Le

A game method for improving the interpretability of convolution neural network

Real artificial intelligence always has been focused on by many machine learning researchers, especially in the area of deep learning. However deep neural network is hard to be understood and explained, and sometimes, even metaphysics. The…

Machine Learning · Computer Science 2019-10-22 Jinwei Zhao , Qizhou Wang , Fuqiang Zhang , Wanli Qiu , Yufei Wang , Yu Liu , Guo Xie , Weigang Ma , Bin Wang , Xinhong Hei

Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis

Graph neural networks (GNNs) are highly effective on a variety of graph-related tasks; however, they lack interpretability and transparency. Current explainability approaches are typically local and treat GNNs as black-boxes. They do not…

Machine Learning · Computer Science 2023-03-10 Han Xuanyuan , Pietro Barbiero , Dobrik Georgiev , Lucie Charlotte Magister , Pietro Lió

A Comprehensive Survey on Self-Interpretable Neural Networks

Neural networks have achieved remarkable success across various fields. However, the lack of interpretability limits their practical use, particularly in critical decision-making scenarios. Post-hoc interpretability, which provides…

Machine Learning · Computer Science 2025-11-21 Yang Ji , Ying Sun , Yuting Zhang , Zhigaoyuan Wang , Yuanxin Zhuang , Zheng Gong , Dazhong Shen , Chuan Qin , Hengshu Zhu , Hui Xiong

Generating Neural Networks with Neural Networks

Hypernetworks are neural networks that generate weights for another neural network. We formulate the hypernetwork training objective as a compromise between accuracy and diversity, where the diversity takes into account trivial symmetry…

Machine Learning · Statistics 2018-04-10 Lior Deutsch

Physically Interpretable Neural Networks for the Geosciences: Applications to Earth System Variability

Neural networks have become increasingly prevalent within the geosciences, although a common limitation of their usage has been a lack of methods to interpret what the networks learn and how they make decisions. As such, neural networks…

Atmospheric and Oceanic Physics · Physics 2020-10-28 Benjamin A. Toms , Elizabeth A. Barnes , Imme Ebert-Uphoff

Learning Transformer Programs

Recent research in mechanistic interpretability has attempted to reverse-engineer Transformer models by carefully inspecting network weights and activations. However, these approaches require considerable manual effort and still fall short…

Machine Learning · Computer Science 2023-11-01 Dan Friedman , Alexander Wettig , Danqi Chen

A Survey on Neural Network Interpretability

Along with the great success of deep neural networks, there is also growing concern about their black-box nature. The interpretability issue affects people's trust on deep learning systems. It is also related to many ethical problems, e.g.,…

Machine Learning · Computer Science 2022-02-01 Yu Zhang , Peter Tiňo , Aleš Leonardis , Ke Tang

Open Problems in Mechanistic Interpretability

Mechanistic interpretability aims to understand the computational mechanisms underlying neural networks' capabilities in order to accomplish concrete scientific and engineering goals. Progress in this field thus promises to provide greater…

Machine Learning · Computer Science 2025-01-29 Lee Sharkey , Bilal Chughtai , Joshua Batson , Jack Lindsey , Jeff Wu , Lucius Bushnaq , Nicholas Goldowsky-Dill , Stefan Heimersheim , Alejandro Ortega , Joseph Bloom , Stella Biderman , Adria Garriga-Alonso , Arthur Conmy , Neel Nanda , Jessica Rumbelow , Martin Wattenberg , Nandi Schoots , Joseph Miller , Eric J. Michaud , Stephen Casper , Max Tegmark , William Saunders , David Bau , Eric Todd , Atticus Geiger , Mor Geva , Jesse Hoogland , Daniel Murfet , Tom McGrath

Understanding polysemanticity in neural networks through coding theory

Despite substantial efforts, neural network interpretability remains an elusive goal, with previous research failing to provide succinct explanations of most single neurons' impact on the network output. This limitation is due to the…

Machine Learning · Computer Science 2024-02-01 Simon C. Marshall , Jan H. Kirchner

A mechanistically interpretable neural network for regulatory genomics

Deep neural networks excel in mapping genomic DNA sequences to associated readouts (e.g., protein-DNA binding). Beyond prediction, the goal of these networks is to reveal to scientists the underlying motifs (and their syntax) which drive…

Genomics · Quantitative Biology 2024-10-10 Alex M. Tseng , Gokcen Eraslan , Tommaso Biancalani , Gabriele Scalia

Decoupling Deep Learning for Interpretable Image Recognition

The interpretability of neural networks has recently received extensive attention. Previous prototype-based explainable networks involved prototype activation in both reasoning and interpretation processes, requiring specific explainable…

Computer Vision and Pattern Recognition · Computer Science 2022-11-22 Yitao Peng , Yihang Liu , Longzhen Yang , Lianghua He

Interpreting Neural Networks to Improve Politeness Comprehension

We present an interpretable neural network approach to predicting and understanding politeness in natural language requests. Our models are based on simple convolutional neural networks directly on raw text, avoiding any manual…

Computation and Language · Computer Science 2016-10-11 Malika Aubakirova , Mohit Bansal

Making Neural Networks Interpretable with Attribution: Application to Implicit Signals Prediction

Explaining recommendations enables users to understand whether recommended items are relevant to their needs and has been shown to increase their trust in the system. More generally, if designing explainable machine learning models is key…

Machine Learning · Computer Science 2020-08-27 Darius Afchar , Romain Hennequin