Related papers: Switched linear encoding with rectified linear aut…

Discriminative Recurrent Sparse Auto-Encoders

We present the discriminative recurrent sparse auto-encoder model, comprising a recurrent encoder of rectified linear units, unrolled for a fixed number of iterations, and connected to two linear decoders that reconstruct the input and…

Machine Learning · Computer Science 2013-03-20 Jason Tyler Rolfe , Yann LeCun

k-Sparse Autoencoders

Recently, it has been observed that when representations are learnt in a way that encourages sparsity, improved performance is obtained on classification tasks. These methods involve combinations of activation functions, sampling steps and…

Machine Learning · Computer Science 2014-03-25 Alireza Makhzani , Brendan Frey

Binary autoencoder with random binary weights

Here is presented an analysis of an autoencoder with binary activations $\{0, 1\}$ and binary $\{0, 1\}$ random weights. Such set up puts this model at the intersection of different fields: neuroscience, information theory, sparse coding,…

Machine Learning · Computer Science 2020-05-01 Viacheslav Osaulenko

Sparse Autoencoders Enable Scalable and Reliable Circuit Identification in Language Models

This paper introduces an efficient and robust method for discovering interpretable circuits in large language models using discrete sparse autoencoders. Our approach addresses key limitations of existing techniques, namely computational…

Computation and Language · Computer Science 2024-05-22 Charles O'Neill , Thang Bui

On Optimality Conditions for Auto-Encoder Signal Recovery

Auto-Encoders are unsupervised models that aim to learn patterns from observed data by minimizing a reconstruction cost. The useful representations learned are often found to be sparse and distributed. On the other hand, compressed sensing…

Machine Learning · Statistics 2017-07-14 Devansh Arpit , Yingbo Zhou , Hung Q. Ngo , Nils Napp , Venu Govindaraju

Rank Ordered Autoencoders

A new method for the unsupervised learning of sparse representations using autoencoders is proposed and implemented by ordering the output of the hidden units by their activation value and progressively reconstructing the input in this…

Machine Learning · Computer Science 2016-05-09 Paul Bertens

Scaling and evaluating sparse autoencoders

Sparse autoencoders provide a promising unsupervised approach for extracting interpretable features from a language model by reconstructing activations from a sparse bottleneck layer. Since language models learn many concepts, autoencoders…

Machine Learning · Computer Science 2024-06-07 Leo Gao , Tom Dupré la Tour , Henk Tillman , Gabriel Goh , Rajan Troll , Alec Radford , Ilya Sutskever , Jan Leike , Jeffrey Wu

A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features

In this note we present a generative model of natural images consisting of a deep hierarchy of layers of latent random variables, each of which follows a new type of distribution that we call rectified Gaussian. These rectified Gaussian…

Machine Learning · Statistics 2016-03-01 Tim Salimans

Are We Using Autoencoders in a Wrong Way?

Autoencoders are certainly among the most studied and used Deep Learning models: the idea behind them is to train a model in order to reconstruct the same input data. The peculiarity of these models is to compress the information through a…

Machine Learning · Computer Science 2023-09-06 Gabriele Martino , Davide Moroni , Massimo Martinelli

Do Sparse Autoencoders Identify Reasoning Features in Language Models?

We study how reliably sparse autoencoders (SAEs) support claims about reasoning-related internal features in large language models. We first give a stylized analysis showing that sparsity-regularized decoding can preferentially retain…

Machine Learning · Computer Science 2026-05-19 George Ma , Zhongyuan Liang , Irene Y. Chen , Somayeh Sojoudi

Zero-bias autoencoders and the benefits of co-adapting features

Regularized training of an autoencoder typically results in hidden unit biases that take on large negative values. We show that negative biases are a natural result of using a hidden layer whose responsibility is to both represent the input…

Machine Learning · Statistics 2015-04-09 Kishore Konda , Roland Memisevic , David Krueger

A linear approach for sparse coding by a two-layer neural network

Many approaches to transform classification problems from non-linear to linear by feature transformation have been recently presented in the literature. These notably include sparse coding methods and deep neural networks. However, many of…

Machine Learning · Computer Science 2015-07-08 Alessandro Montalto , Giovanni Tessitore , Roberto Prevete

Improving Dictionary Learning with Gated Sparse Autoencoders

Recent work has found that sparse autoencoders (SAEs) are an effective technique for unsupervised discovery of interpretable features in language models' (LMs) activations, by finding sparse, linear reconstructions of LM activations. We…

Machine Learning · Computer Science 2024-05-01 Senthooran Rajamanoharan , Arthur Conmy , Lewis Smith , Tom Lieberum , Vikrant Varma , János Kramár , Rohin Shah , Neel Nanda

Sparse Autoencoders Find Highly Interpretable Features in Language Models

One of the roadblocks to a better understanding of neural networks' internals is \textit{polysemanticity}, where neurons appear to activate in multiple, semantically distinct contexts. Polysemanticity prevents us from identifying concise,…

Machine Learning · Computer Science 2023-10-05 Hoagy Cunningham , Aidan Ewart , Logan Riggs , Robert Huben , Lee Sharkey

Sparse Autoencoders Make Audio Foundation Models more Explainable

Audio pretrained models are widely employed to solve various tasks in speech processing, sound event detection, or music information retrieval. However, the representations learned by these models are unclear, and their analysis mainly…

Sound · Computer Science 2025-12-18 Théo Mariotte , Martin Lebourdais , Antonio Almudévar , Marie Tahon , Alfonso Ortega , Nicolas Dugué

Deep Kernelized Autoencoders

In this paper we introduce the deep kernelized autoencoder, a neural network model that allows an explicit approximation of (i) the mapping from an input space to an arbitrary, user-specified kernel space and (ii) the back-projection from…

Machine Learning · Statistics 2017-02-09 Michael Kampffmeyer , Sigurd Løkse , Filippo Maria Bianchi , Robert Jenssen , Lorenzo Livi

Sparse Autoencoders, Again?

Is there really much more to say about sparse autoencoders (SAEs)? Autoencoders in general, and SAEs in particular, represent deep architectures that are capable of modeling low-dimensional latent structure in data. Such structure could…

Machine Learning · Computer Science 2025-06-09 Yin Lu , Xuening Zhu , Tong He , David Wipf

Logical Interpretations of Autoencoders

The unification of low-level perception and high-level reasoning is a long-standing problem in artificial intelligence, which has the potential to not only bring the areas of logic and learning closer together but also demonstrate how…

Artificial Intelligence · Computer Science 2019-11-27 Anton Fuxjaeger , Vaishak Belle

Can sparse autoencoders make sense of gene expression latent variable models?

Sparse autoencoders (SAEs) have lately been used to uncover interpretable latent features in large language models. By projecting dense embeddings into a much higher-dimensional and sparse space, learned features become disentangled and…

Machine Learning · Computer Science 2025-07-30 Viktoria Schuster

Unsupervised learning with sparse space-and-time autoencoders

We use spatially-sparse two, three and four dimensional convolutional autoencoder networks to model sparse structures in 2D space, 3D space, and 3+1=4 dimensional space-time. We evaluate the resulting latent spaces by testing their…

Computer Vision and Pattern Recognition · Computer Science 2018-11-27 Benjamin Graham