Related papers: Structural Entropy Guided Probabilistic Coding

Structured Probabilistic Coding

This paper presents a new supervised representation learning framework, namely structured probabilistic coding (SPC), to learn compact and informative representations from input related to the target task. SPC is an encoder-only…

Computation and Language · Computer Science 2024-05-03 Dou Hu , Lingwei Wei , Yaxin Liu , Wei Zhou , Songlin Hu

Structural Embedding Projection for Contextual Large Language Model Inference

Structured embedding transformations offer a promising approach for enhancing the efficiency and coherence of language model inference. The introduction of Structural Embedding Projection (SEP) provides a mechanism for refining token…

Computation and Language · Computer Science 2025-08-11 Vincent Enoasmo , Cedric Featherstonehaugh , Xavier Konstantinopoulos , Zacharias Huntington

SEEC: Segmentation-Assisted Multi-Entropy Models for Learned Lossless Image Compression

Recently, learned image compression has attracted considerable attention due to its superior performance over traditional methods. However, most existing approaches employ a single entropy model to estimate the probability distribution of…

Computer Vision and Pattern Recognition · Computer Science 2026-03-31 Chunhang Zheng , Zichang Ren , Dou Li

Neuro-Symbolic Entropy Regularization

In structured prediction, the goal is to jointly predict many output variables that together encode a structured object -- a path in a graph, an entity-relation triple, or an ordering of objects. Such a large output space makes learning…

Machine Learning · Computer Science 2022-01-28 Kareem Ahmed , Eric Wang , Kai-Wei Chang , Guy Van den Broeck

Structural-Entropy-Based Sample Selection for Efficient and Effective Learning

Sample selection improves the efficiency and effectiveness of machine learning models by providing informative and representative samples. Typically, samples can be modeled as a sample graph, where nodes are samples and edges represent…

Machine Learning · Computer Science 2025-03-04 Tianchi Xie , Jiangning Zhu , Guozu Ma , Minzhi Lin , Wei Chen , Weikai Yang , Shixia Liu

Unveiling the Potential of Probabilistic Embeddings in Self-Supervised Learning

In recent years, self-supervised learning has played a pivotal role in advancing machine learning by allowing models to acquire meaningful representations from unlabeled data. An intriguing research avenue involves developing…

Machine Learning · Computer Science 2023-10-30 Denis Janiak , Jakub Binkowski , Piotr Bielak , Tomasz Kajdanowicz

SECodec: Structural Entropy-based Compressive Speech Representation Codec for Speech Language Models

With the rapid advancement of large language models (LLMs), discrete speech representations have become crucial for integrating speech into LLMs. Existing methods for speech representation discretization rely on a predefined codebook size…

Sound · Computer Science 2025-01-03 Linqin Wang , Yaping Liu , Zhengtao Yu , Shengxiang Gao , Cunli Mao , Yuxin Huang , Wenjun Wang , Ling Dong

Probabilistic Subspace Manifolds for Contextual Inference in Large Language Models

Representing token embeddings as probability distributions over learned manifolds allows for more flexible contextual inference, reducing representational rigidity while enhancing semantic granularity. Comparative evaluations demonstrate…

Computation and Language · Computer Science 2025-04-25 Christopher Nightingale , Dominic Lavington , Jonathan Thistlethwaite , Sebastian Penhaligon , Thomas Belinski , David Boldo

Structural Entropy Guided Graph Hierarchical Pooling

Following the success of convolution on non-Euclidean space, the corresponding pooling approaches have also been validated on various tasks regarding graphs. However, because of the fixed compression quota and stepwise pooling design, these…

Machine Learning · Computer Science 2022-06-29 Junran Wu , Xueyuan Chen , Ke Xu , Shangzhe Li

Probabilistic Embeddings with Laplacian Graph Priors

We introduce probabilistic embeddings using Laplacian priors (PELP). The proposed model enables incorporating graph side-information into static word embeddings. We theoretically show that the model unifies several previously proposed…

Computation and Language · Computer Science 2022-04-06 Väinö Yrjänäinen , Måns Magnusson

Uncertainty-driven Embedding Convolution

Text embeddings are essential components in modern NLP pipelines. Although numerous embedding models have been proposed, no single model consistently dominates across domains and tasks. This variability motivates the use of ensemble…

Machine Learning · Computer Science 2026-02-13 Sungjun Lim , Kangjun Noh , Youngjun Choi , Heeyoung Lee , Kyungwoo Song

Learning to Embed Distributions via Maximum Kernel Entropy

Empirical data can often be considered as samples from a set of probability distributions. Kernel methods have emerged as a natural approach for learning to classify these distributions. Although numerous kernels between distributions have…

Machine Learning · Computer Science 2024-12-02 Oleksii Kachaiev , Stefano Recanatesi

Neural Code Comprehension: A Learnable Representation of Code Semantics

With the recent success of embeddings in natural language processing, research has been conducted into applying similar methods to code analysis. Most works attempt to process the code directly or use a syntactic tree representation,…

Machine Learning · Computer Science 2018-11-30 Tal Ben-Nun , Alice Shoshana Jakobovits , Torsten Hoefler

Why Self-Supervised Encoders Want to Be Normal

Self-supervised learning has achieved remarkable empirical success in learning robust representations without explicit labels, most recently demonstrated within the framework of Joint-Embedding Predictive Architectures (JEPA). However, a…

Information Theory · Computer Science 2026-05-05 Yuval Domb

Latent Structure Modulation in Large Language Models Through Stochastic Concept Embedding Transitions

Stochastic embedding transitions introduce a probabilistic mechanism for adjusting token representations dynamically during inference, mitigating the constraints imposed through static or deterministic embeddings. A transition framework was…

Computation and Language · Computer Science 2025-08-11 Stefan Whitaker , Colin Sisate , Marcel Windsor , Nikolai Fairweather , Tarquin Goldborough , Oskar Lindenfeld

Entropy Guided Spectrum Based Bug Localization Using Statistical Language Model

Locating bugs is challenging but one of the most important activities in software development and maintenance phase because there are no certain rules to identify all types of bugs. Existing automatic bug localization tools use various…

Software Engineering · Computer Science 2018-02-21 Saikat Chakraborty , Yujian Li , Matt Irvine , Ripon Saha , Baishakhi Ray

Structural Learning of Probabilistic Sentential Decision Diagrams under Partial Closed-World Assumption

Probabilistic sentential decision diagrams are a class of structured-decomposable probabilistic circuits especially designed to embed logical constraints. To adapt the classical LearnSPN scheme to learn the structure of these models, we…

Artificial Intelligence · Computer Science 2021-07-27 Alessandro Antonucci , Alessandro Facchini , Lilith Mattei

Tractable Regularization of Probabilistic Circuits

Probabilistic Circuits (PCs) are a promising avenue for probabilistic modeling. They combine advantages of probabilistic graphical models (PGMs) with those of neural networks (NNs). Crucially, however, they are tractable probabilistic…

Machine Learning · Computer Science 2021-06-07 Anji Liu , Guy Van den Broeck

Structural Embedding of Syntactic Trees for Machine Comprehension

Deep neural networks for machine comprehension typically utilizes only word or character embeddings without explicitly taking advantage of structured linguistic information such as constituency trees and dependency trees. In this paper, we…

Computation and Language · Computer Science 2017-09-04 Rui Liu , Junjie Hu , Wei Wei , Zi Yang , Eric Nyberg

Entropy-Aligned Decoding of LMs for Better Writing and Reasoning

Language models (LMs) are trained on billions of tokens in an attempt to recover the true language distribution. Still, vanilla random sampling from LMs yields low quality generations. Decoding algorithms attempt to restrict the LM…

Machine Learning · Computer Science 2026-01-06 Kareem Ahmed , Sameer Singh