Related papers: Intermediate Entity-based Sparse Interpretable Rep…

Interpretable Neural Embeddings with Sparse Self-Representation

Interpretability benefits the theoretical understanding of representations. Existing word embeddings are generally dense representations. Hence, the meaning of latent dimensions is difficult to interpret. This makes word embeddings like a…

Computation and Language · Computer Science 2023-06-27 Minxue Xia , Hao Zhu

Interpretable Entity Representations through Large-Scale Typing

In standard methodology for natural language processing, entities in text are typically embedded in dense vector spaces with pre-trained models. The embeddings produced this way are effective when fed into downstream models, but they…

Computation and Language · Computer Science 2020-10-14 Yasumasa Onoe , Greg Durrett

Biomedical Interpretable Entity Representations

Pre-trained language models induce dense entity representations that offer strong performance on entity-centric NLP tasks, but such representations are not immediately interpretable. This can be a barrier to model uptake in important…

Computation and Language · Computer Science 2021-06-18 Diego Garcia-Olano , Yasumasa Onoe , Ioana Baldini , Joydeep Ghosh , Byron C. Wallace , Kush R. Varshney

IERL: Interpretable Ensemble Representation Learning -- Combining CrowdSourced Knowledge and Distributed Semantic Representations

Large Language Models (LLMs) encode meanings of words in the form of distributed semantics. Distributed semantics capture common statistical patterns among language tokens (words, phrases, and sentences) from large amounts of data. LLMs…

Computation and Language · Computer Science 2023-06-27 Yuxin Zi , Kaushik Roy , Vignesh Narayanan , Manas Gaur , Amit Sheth

Learning and Evaluating Sparse Interpretable Sentence Embeddings

Previous research on word embeddings has shown that sparse representations, which can be either learned on top of existing dense embeddings or obtained through model constraints during training time, have the benefit of increased…

Computation and Language · Computer Science 2018-09-26 Valentin Trifonov , Octavian-Eugen Ganea , Anna Potapenko , Thomas Hofmann

Lightly-supervised Representation Learning with Global Interpretability

We propose a lightly-supervised approach for information extraction, in particular named entity classification, which combines the benefits of traditional bootstrapping, i.e., use of limited annotations and interpretability of extraction…

Computation and Language · Computer Science 2018-05-30 Marco A. Valenzuela-Escárcega , Ajay Nagesh , Mihai Surdeanu

Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge

Distributional models provide a convenient way to model semantics using dense embedding spaces derived from unsupervised learning algorithms. However, the dimensions of dense embedding spaces are not designed to resemble human semantic…

Computation and Language · Computer Science 2018-11-15 Steven Derby , Paul Miller , Brian Murphy , Barry Devereux

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability

The visual representation of a pre-trained model prioritizes the classifiability on downstream tasks, while the widespread applications for pre-trained visual models have posed new requirements for representation interpretability. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-10-29 Shufan Shen , Zhaobo Qi , Junshu Sun , Qingming Huang , Qi Tian , Shuhui Wang

LDIR: Low-Dimensional Dense and Interpretable Text Embeddings with Relative Representations

Semantic text representation is a fundamental task in the field of natural language processing. Existing text embedding (e.g., SimCSE and LLM2Vec) have demonstrated excellent performance, but the values of each dimension are difficult to…

Computation and Language · Computer Science 2025-05-19 Yile Wang , Zhanyu Shen , Hui Huang

LLM Interpretability with Identifiable Temporal-Instantaneous Representation

Despite Large Language Models' remarkable capabilities, understanding their internal representations remains challenging. Mechanistic interpretability tools such as sparse autoencoders (SAEs) were developed to extract interpretable features…

Machine Learning · Computer Science 2026-01-06 Xiangchen Song , Jiaqi Sun , Zijian Li , Yujia Zheng , Kun Zhang

Evaluating Sparse Interpretable Word Embeddings for Biomedical Domain

Word embeddings have found their way into a wide range of natural language processing tasks including those in the biomedical domain. While these vector representations successfully capture semantic and syntactic word relations, hidden…

Computation and Language · Computer Science 2020-05-12 Mohammad Amin Samadi , Mohammad Sadegh Akhondzadeh , Sayed Jalal Zahabi , Mohammad Hossein Manshaei , Zeinab Maleki , Payman Adibi

Decoding Dense Embeddings: Sparse Autoencoders for Interpreting and Discretizing Dense Retrieval

Despite their strong performance, Dense Passage Retrieval (DPR) models suffer from a lack of interpretability. In this work, we propose a novel interpretability framework that leverages Sparse Autoencoders (SAEs) to decompose previously…

Information Retrieval · Computer Science 2025-08-28 Seongwan Park , Taeklim Kim , Youngjoong Ko

Sparse Interpretable Deep Learning with LIES Networks for Symbolic Regression

Symbolic regression (SR) aims to discover closed-form mathematical expressions that accurately describe data, offering interpretability and analytical insight beyond standard black-box models. Existing SR methods often rely on…

Machine Learning · Computer Science 2025-06-17 Mansooreh Montazerin , Majd Al Aawar , Antonio Ortega , Ajitesh Srivastava

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit

Analyzing large-scale text corpora is a core challenge in machine learning, crucial for tasks like identifying undesirable model behaviors or biases in training data. Current methods often rely on costly LLM-based techniques (e.g.…

Artificial Intelligence · Computer Science 2025-12-12 Nick Jiang , Xiaoqing Sun , Lisa Dunlap , Lewis Smith , Neel Nanda

A Disentangling Invertible Interpretation Network for Explaining Latent Representations

Neural networks have greatly boosted performance in computer vision by learning powerful representations of input data. The drawback of end-to-end training for maximal overall performance are black-box models whose hidden representations…

Computer Vision and Pattern Recognition · Computer Science 2020-04-29 Patrick Esser , Robin Rombach , Björn Ommer

Interpretable Multi-task Learning with Shared Variable Embeddings

This paper proposes a general interpretable predictive system with shared information. The system is able to perform predictions in a multi-task setting where distinct tasks are not bound to have the same input/output structure. Embeddings…

Machine Learning · Computer Science 2024-07-02 Maciej Żelaszczyk , Jacek Mańdziuk

Word Equations: Inherently Interpretable Sparse Word Embeddingsthrough Sparse Coding

Word embeddings are a powerful natural language processing technique, but they are extremely difficult to interpret. To enable interpretable NLP models, we create vectors where each dimension is inherently interpretable. By inherently…

Computation and Language · Computer Science 2021-09-29 Adly Templeton

iBERT: Interpretable Embeddings via Sense Decomposition

We present iBERT (interpretable-BERT), an encoder to produce inherently interpretable and controllable embeddings - designed to modularize and expose the discriminative cues present in language, such as semantic or stylistic structure. Each…

Computation and Language · Computer Science 2026-01-27 Vishal Anand , Milad Alshomary , Kathleen McKeown

Multi-Scale Grouped Prototypes for Interpretable Semantic Segmentation

Prototypical part learning is emerging as a promising approach for making semantic segmentation interpretable. The model selects real patches seen during training as prototypes and constructs the dense prediction map based on the similarity…

Computer Vision and Pattern Recognition · Computer Science 2025-04-29 Hugo Porta , Emanuele Dalsasso , Diego Marcos , Devis Tuia

Interpretable Multi-dataset Evaluation for Named Entity Recognition

With the proliferation of models for natural language processing tasks, it is even harder to understand the differences between models and their relative merits. Simply looking at differences between holistic metrics such as accuracy, BLEU,…

Computation and Language · Computer Science 2020-12-10 Jinlan Fu , Pengfei Liu , Graham Neubig