English
Related papers

Related papers: Using Interpretation Methods for Model Enhancement

200 papers

Neuron Interpretation has gained traction in the field of interpretability, and have provided fine-grained insights into what a model learns and how language knowledge is distributed amongst its different components. However, the lack of…

Computation and Language · Computer Science 2023-11-07 Yimin Fan , Fahim Dalvi , Nadir Durrani , Hassan Sajjad

Concurrent to the rapid progress in the development of neural-network based models in areas like natural language processing and computer vision, the need for creating explanations for the predictions of these black-box models has risen…

Computation and Language · Computer Science 2025-08-18 Marc Brinner , Sina Zarriess

Neural network models have achieved state-of-the-art performances in a wide range of natural language processing (NLP) tasks. However, a long-standing criticism against neural network models is the lack of interpretability, which not only…

Computation and Language · Computer Science 2021-10-26 Xiaofei Sun , Diyi Yang , Xiaoya Li , Tianwei Zhang , Yuxian Meng , Han Qiu , Guoyin Wang , Eduard Hovy , Jiwei Li

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive…

Machine Learning · Computer Science 2022-02-24 Jayneel Parekh , Pavlo Mozharovskyi , Florence d'Alché-Buc

With the continue development of Convolutional Neural Networks (CNNs), there is a growing concern regarding representations that they encode internally. Analyzing these internal representations is referred to as model interpretation. While…

Computer Vision and Pattern Recognition · Computer Science 2023-05-18 Hamed Behzadi-Khormouji , José Oramas

Recently generating natural language explanations has shown very promising results in not only offering interpretable explanations but also providing additional information and supervision for prediction. However, existing approaches…

Computation and Language · Computer Science 2022-05-30 Wangchunshu Zhou , Jinyi Hu , Hanlin Zhang , Xiaodan Liang , Maosong Sun , Chenyan Xiong , Jian Tang

Training a model with access to human explanations can improve data efficiency and model performance on in- and out-of-domain data. Adding to these empirical findings, similarity with the process of human learning makes learning from…

Computation and Language · Computer Science 2022-04-20 Mareike Hartmann , Daniel Sonntag

State of the art machine learning algorithms are highly optimized to provide the optimal prediction possible, naturally resulting in complex models. While these models often outperform simpler more interpretable models by order of…

Machine Learning · Statistics 2016-11-24 Yotam Hechtlinger

The ability to interpret machine learning models has become increasingly important now that machine learning is used to inform consequential decisions. We propose an approach called model extraction for interpreting complex, blackbox…

Machine Learning · Computer Science 2018-03-14 Osbert Bastani , Carolyn Kim , Hamsa Bastani

We ask whether the neural network interpretation methods can be fooled via adversarial model manipulation, which is defined as a model fine-tuning step that aims to radically alter the explanations without hurting the accuracy of the…

Machine Learning · Computer Science 2019-11-04 Juyeon Heo , Sunghwan Joo , Taesup Moon

Model Interpretation aims at the extraction of insights from the internals of a trained model. A common approach to address this task is the characterization of relevant features internally encoded in the model that are critical for its…

Machine Learning · Computer Science 2024-10-07 Hamed Behzadi-Khormouji , José Oramas

Many NLP applications require models to be interpretable. However, many successful neural architectures, including transformers, still lack effective interpretation methods. A possible solution could rely on building explanations from…

Computation and Language · Computer Science 2024-04-04 Federico Ruggeri , Marco Lippi , Paolo Torroni

Neural NLP models are increasingly accurate but are imperfect and opaque---they break in counterintuitive ways and leave end users puzzled at their behavior. Model interpretation methods ameliorate this opacity by providing explanations for…

Computation and Language · Computer Science 2019-09-23 Eric Wallace , Jens Tuyls , Junlin Wang , Sanjay Subramanian , Matt Gardner , Sameer Singh

Current machine learning models are evaluated through behavioral snapshots, with benchmark accuracies, win rates and outcome-based metrics. Model explanations and evaluations, however, are fundamentally intertwined: understanding why a…

Computers and Society · Computer Science 2026-05-08 Isabelle Lee , Emmy Liu , Cathy Jiao , Brihi Joshi , Dani Yogatama , Fazl Barez , Michael Saxon

For optimization models to be used in practice, it is crucial that users trust the results. A key factor in this aspect is the interpretability of the solution process. A previous framework for inherently interpretable optimization models…

Optimization and Control · Mathematics 2026-02-13 Marc Goerigk , Michael Hartisch , Sebastian Merten , Kartikey Sharma

Recent large language models (LLMs) have demonstrated remarkable prediction performance for a growing array of tasks. However, their proliferation into high-stakes domains (e.g. medicine) and compute-limited settings has created a…

Artificial Intelligence · Computer Science 2023-12-05 Chandan Singh , Armin Askari , Rich Caruana , Jianfeng Gao

We take inspiration from the study of human explanation to inform the design and evaluation of interpretability methods in machine learning. First, we survey the literature on human explanation in philosophy, cognitive science, and the…

Artificial Intelligence · Computer Science 2021-09-21 David Alvarez-Melis , Harmanpreet Kaur , Hal Daumé , Hanna Wallach , Jennifer Wortman Vaughan

There has been increasing interest in evaluations of language models for a variety of risks and characteristics. Evaluations relying on natural language understanding for grading can often be performed at scale by using other language…

Computation and Language · Computer Science 2023-12-11 Simon Lermen , Ondřej Kvapil

Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned…

Machine Learning · Statistics 2019-11-15 W. James Murdoch , Chandan Singh , Karl Kumbier , Reza Abbasi-Asl , Bin Yu

The last decade has seen huge progress in the development of advanced machine learning models; however, those models are powerless unless human users can interpret them. Here we show how the mind's construction of concepts and meaning can…

Machine Learning · Statistics 2016-07-04 Nick Condry
‹ Prev 1 2 3 10 Next ›