Related papers: Structuring Autoencoders

Sparse Autoencoders, Again?

Is there really much more to say about sparse autoencoders (SAEs)? Autoencoders in general, and SAEs in particular, represent deep architectures that are capable of modeling low-dimensional latent structure in data. Such structure could…

Machine Learning · Computer Science 2025-06-09 Yin Lu , Xuening Zhu , Tong He , David Wipf

Can sparse autoencoders make sense of gene expression latent variable models?

Sparse autoencoders (SAEs) have lately been used to uncover interpretable latent features in large language models. By projecting dense embeddings into a much higher-dimensional and sparse space, learned features become disentangled and…

Machine Learning · Computer Science 2025-07-30 Viktoria Schuster

Latent space configuration for improved generalization in supervised autoencoder neural networks

Autoencoders (AE) are simple yet powerful class of neural networks that compress data by projecting input into low-dimensional latent space (LS). Whereas LS is formed according to the loss function minimization during training, its…

Computer Vision and Pattern Recognition · Computer Science 2025-08-28 Nikita Gabdullin

An Auto-Encoder Strategy for Adaptive Image Segmentation

Deep neural networks are powerful tools for biomedical image segmentation. These models are often trained with heavy supervision, relying on pairs of images and corresponding voxel-level labels. However, obtaining segmentations of…

Image and Video Processing · Electrical Eng. & Systems 2020-04-30 Evan M. Yu , Juan Eugenio Iglesias , Adrian V. Dalca , Mert R. Sabuncu

Sparse Autoencoders Make Audio Foundation Models more Explainable

Audio pretrained models are widely employed to solve various tasks in speech processing, sound event detection, or music information retrieval. However, the representations learned by these models are unclear, and their analysis mainly…

Sound · Computer Science 2025-12-18 Théo Mariotte , Martin Lebourdais , Antonio Almudévar , Marie Tahon , Alfonso Ortega , Nicolas Dugué

Features Emerge as Discrete States: The First Application of SAEs to 3D Representations

Sparse Autoencoders (SAEs) are a powerful dictionary learning technique for decomposing neural network activations, translating the hidden state into human ideas with high semantic value despite no external intervention or guidance.…

Machine Learning · Computer Science 2025-12-17 Albert Miao , Chenliang Zhou , Jiawei Zhou , Cengiz Oztireli

An image compression and encryption scheme based on deep learning

Stacked Auto-Encoder (SAE) is a kind of deep learning algorithm for unsupervised learning. Which has multi layers that project the vector representation of input data into a lower vector space. These projection vectors are dense…

Computer Vision and Pattern Recognition · Computer Science 2016-10-11 Fei Hu , Changjiu Pu , Haowei Gao , Mengzi Tang , Li Li

Learning Latent Subspaces in Variational Autoencoders

Variational autoencoders (VAEs) are widely used deep generative models capable of learning unsupervised latent representations of data. Such representations are often difficult to interpret or control. We consider the problem of…

Machine Learning · Computer Science 2018-12-18 Jack Klys , Jake Snell , Richard Zemel

Transcoders Beat Sparse Autoencoders for Interpretability

Sparse autoencoders (SAEs) extract human-interpretable features from deep neural networks by transforming their activations into a sparse, higher dimensional latent space, and then reconstructing the activations from these latents.…

Machine Learning · Computer Science 2025-02-13 Gonçalo Paulo , Stepan Shabalin , Nora Belrose

Analyzing (In)Abilities of SAEs via Formal Languages

Autoencoders have been used for finding interpretable and disentangled features underlying neural network representations in both image and text domains. While the efficacy and pitfalls of such methods are well-studied in vision, there is a…

Machine Learning · Computer Science 2025-02-06 Abhinav Menon , Manish Shrivastava , David Krueger , Ekdeep Singh Lubana

Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry

Sparse Autoencoders (SAEs) are widely used to interpret neural networks by identifying meaningful concepts from their representations. However, do SAEs truly uncover all concepts a model relies on, or are they inherently biased toward…

Machine Learning · Computer Science 2025-12-03 Sai Sumedh R. Hindupur , Ekdeep Singh Lubana , Thomas Fel , Demba Ba

Sparse Autoencoders Reveal Interpretable Structure in Small Gene Language Models

Sparse autoencoders (SAEs) have recently emerged as a powerful tool for interpreting the internal representations of large language models (LLMs), revealing latent latent features with semantical meaning. This interpretability has also…

Other Quantitative Biology · Quantitative Biology 2025-07-11 Haoxiang Guan , Jiyan He , Jie Zhang

Probing the Representational Power of Sparse Autoencoders in Vision Models

Sparse Autoencoders (SAEs) have emerged as a popular tool for interpreting the hidden states of large language models (LLMs). By learning to reconstruct activations from a sparse bottleneck layer, SAEs discover interpretable features from…

Computer Vision and Pattern Recognition · Computer Science 2025-09-19 Matthew Lyle Olson , Musashi Hinck , Neale Ratzlaff , Changbai Li , Phillip Howard , Vasudev Lal , Shao-Yen Tseng

Latent Variables on Spheres for Autoencoders in High Dimensions

Variational Auto-Encoder (VAE) has been widely applied as a fundamental generative model in machine learning. For complex samples like imagery objects or scenes, however, VAE suffers from the dimensional dilemma between reconstruction…

Machine Learning · Computer Science 2020-02-18 Deli Zhao , Jiapeng Zhu , Bo Zhang

Do Sparse Autoencoders Capture Concept Manifolds?

Sparse autoencoders (SAEs) are widely used to extract interpretable features from neural network representations, often under the implicit assumption that concepts correspond to independent linear directions. However, a growing body of…

Machine Learning · Computer Science 2026-05-01 Usha Bhalla , Thomas Fel , Can Rager , Sheridan Feucht , Tal Haklay , Daniel Wurgaft , Siddharth Boppana , Matthew Kowal , Vasudev Shyam , Jack Merullo , Atticus Geiger , Ekdeep Singh Lubana

SMIXAE: Towards Unsupervised Manifold Discovery in Language Models

Sparse autoencoders (SAEs) have been used widely to decompose and interpret neural network activations, especially those of transformer language models. One key issue with SAEs is their inability to directly model multidimensional features.…

Machine Learning · Computer Science 2026-05-12 Collin Francel

Interpretable Embeddings with Sparse Autoencoders: A Data Analysis Toolkit

Analyzing large-scale text corpora is a core challenge in machine learning, crucial for tasks like identifying undesirable model behaviors or biases in training data. Current methods often rely on costly LLM-based techniques (e.g.…

Artificial Intelligence · Computer Science 2025-12-12 Nick Jiang , Xiaoqing Sun , Lisa Dunlap , Lewis Smith , Neel Nanda

Towards Open-Ended Visual Scientific Discovery with Sparse Autoencoders

Scientific archives now contain hundreds of petabytes of data across genomics, ecology, climate, and molecular biology that could reveal undiscovered patterns if systematically analyzed at scale. Large-scale, weakly-supervised datasets in…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Samuel Stevens , Jacob Beattie , Tanya Berger-Wolf , Yu Su

Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability

Translating the internal representations and computations of models into concepts that humans can understand is a key goal of interpretability. While recent dictionary learning methods such as Sparse Autoencoders (SAEs) provide a promising…

Computation and Language · Computer Science 2026-02-27 Usha Bhalla , Alex Oesterling , Claudio Mayrink Verdun , Himabindu Lakkaraju , Flavio P. Calmon

Efficient Dictionary Learning with Switch Sparse Autoencoders

Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in frontier models, it will be necessary to…

Machine Learning · Computer Science 2025-06-04 Anish Mudide , Joshua Engels , Eric J. Michaud , Max Tegmark , Christian Schroeder de Witt