English
Related papers

Related papers: Enabling hyperparameter optimization in sequential…

200 papers

In this paper we propose Structuring AutoEncoders (SAE). SAEs are neural networks which learn a low dimensional representation of data which are additionally enriched with a desired structure in this low dimensional space. While traditional…

Machine Learning · Computer Science 2019-08-20 Marco Rudolph , Bastian Wandt , Bodo Rosenhahn

Spiking Neural Networks (SNN). SNNs are based on a more biologically inspired approach than usual artificial neural networks. Such models are characterized by complex dynamics between neurons and spikes. These are very sensitive to the…

Neural and Evolutionary Computing · Computer Science 2024-09-06 Thomas Firmin , Pierre Boulet , El-Ghazali Talbi

Sparse Autoencoders (SAEs) have become an important tool in mechanistic interpretability, helping to analyze internal representations in both Large Language Models (LLMs) and Vision Transformers (ViTs). By decomposing polysemantic…

Machine Learning · Computer Science 2026-05-11 Jakub Stępień , Marcin Mazur , Jacek Tabor , Przemysław Spurek

Sparse Autoencoders (SAEs) are a promising approach for extracting neural network representations by learning a sparse and overcomplete decomposition of the network's internal activations. However, SAEs are traditionally trained considering…

Machine Learning · Computer Science 2025-04-02 Jeffrey Olmo , Jared Wilson , Max Forsey , Bryce Hepner , Thomas Vin Howe , David Wingate

Is there really much more to say about sparse autoencoders (SAEs)? Autoencoders in general, and SAEs in particular, represent deep architectures that are capable of modeling low-dimensional latent structure in data. Such structure could…

Machine Learning · Computer Science 2025-06-09 Yin Lu , Xuening Zhu , Tong He , David Wipf

A recent line of work has shown promise in using sparse autoencoders (SAEs) to uncover interpretable features in neural network representations. However, the simple linear-nonlinear encoding mechanism in SAEs limits their ability to perform…

Machine Learning · Computer Science 2025-01-31 Charles O'Neill , Alim Gumran , David Klindt

Recently, sparse autoencoders (SAEs) have emerged as a promising technique for interpreting activations in foundation models by disentangling features into a sparse set of concepts. However, identifying the optimal level of sparsity for…

Machine Learning · Computer Science 2026-04-17 Dongsheng Wang , Jinsen Zhang , Dawei Su , Hui Huang

Sparse autoencoders (SAEs) are one of the main methods to interpret the inner workings of deep neural networks (DNNs), decomposing activations into higher-dimensional features. However, they exhibit critical shortcomings where a large…

Machine Learning · Computer Science 2026-05-19 Michał Brzozowski , Neo Christopher Chung

Motivated by the hypothesis that neural network representations encode abstract, interpretable features as linearly accessible, approximately orthogonal directions, sparse autoencoders (SAEs) have become a popular tool in interpretability.…

Machine Learning · Computer Science 2025-11-05 Valérie Costa , Thomas Fel , Ekdeep Singh Lubana , Bahareh Tolooshams , Demba Ba

Sparse autoencoders (SAEs) have lately been used to uncover interpretable latent features in large language models. By projecting dense embeddings into a much higher-dimensional and sparse space, learned features become disentangled and…

Machine Learning · Computer Science 2025-07-30 Viktoria Schuster

Sparse autoencoders (SAEs) have emerged as a powerful tool for interpreting language model activations by decomposing them into sparse, interpretable features. A popular approach is the TopK SAE, that uses a fixed number of the most active…

Machine Learning · Computer Science 2024-12-10 Bart Bussmann , Patrick Leask , Neel Nanda

Artificial neural networks that can recover latent dynamics from recorded neural activity may provide a powerful avenue for identifying and interpreting the dynamical motifs underlying biological computation. Given that neural variance…

Neurons and Cognition · Quantitative Biology 2023-07-03 Andrew R. Sedler , Christopher Versteeg , Chethan Pandarinath

Sparse autoencoders (SAEs) are a popular method for interpreting concepts represented in large language model (LLM) activations. However, there is a lack of evidence regarding the validity of their interpretations due to the lack of a…

Machine Learning · Computer Science 2025-02-25 Subhash Kantamneni , Joshua Engels , Senthooran Rajamanoharan , Max Tegmark , Neel Nanda

Sparse autoencoders (SAEs) are a recent technique for decomposing neural network activations into human-interpretable features. However, in order for SAEs to identify all features represented in frontier models, it will be necessary to…

Machine Learning · Computer Science 2025-06-04 Anish Mudide , Joshua Engels , Eric J. Michaud , Max Tegmark , Christian Schroeder de Witt

Sparse Autoencoders (SAEs) have been proposed as an unsupervised approach to learn a decomposition of a model's latent space. This enables useful applications such as steering - influencing the output of a model towards a desired concept -…

Machine Learning · Computer Science 2025-12-23 Dana Arad , Aaron Mueller , Yonatan Belinkov

Sparse Autoencoders (SAEs) have proven to be powerful tools for interpreting neural networks by decomposing hidden representations into disentangled, interpretable features via sparsity constraints. However, conventional SAEs are…

Sparse autoencoders (SAEs) model the activations of a neural network as linear combinations of sparsely occurring directions of variation (latents). The ability of SAEs to reconstruct activations follows scaling laws w.r.t. the number of…

Machine Learning · Computer Science 2025-09-05 Eric J. Michaud , Liv Gorton , Tom McGrath

Sparse autoencoders (SAEs) have recently emerged as a powerful tool for language model steering. Prior work has explored top-k SAE latents for steering, but we observe that many dimensions among the top-k latents capture non-semantic…

Computation and Language · Computer Science 2025-10-03 Jiaqing Xie

While the activations of neurons in deep neural networks usually do not have a simple human-understandable interpretation, sparse autoencoders (SAEs) can be used to transform these activations into a higher-dimensional latent space which…

Machine Learning · Computer Science 2025-08-07 Gonçalo Paulo , Alex Mallen , Caden Juang , Nora Belrose

Sparse autoencoders (SAEs) have been applied to large language models and protein language models, but not systematically to electronic health record (EHR) foundation models. We train TopK SAEs on FlatASCEND, a 14.5-million-parameter…

Machine Learning · Computer Science 2026-05-07 Chris Sainsbury , Feng Dong , Andreas Karwath
‹ Prev 1 2 3 10 Next ›