Related papers: Minimal Random Code Learning with Mean-KL Paramete…
While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements. Consequently, model size reduction has become an…
Graph-based multi-agent reinforcement learning (MARL) enables coordinated behavior under partial observability by modeling agents as nodes and communication links as edges. While recent methods excel at learning sparse coordination…
Multiple kernel learning (MKL), structured sparsity, and multi-task learning have recently received considerable attention. In this paper, we show how different MKL algorithms can be understood as applications of either regularization on…
Neural network weights are increasingly a bottleneck for deployment, yet most compression pipelines treat layers independently and overlook cross-layer redundancy induced by function-preserving symmetries. We propose Motion-Compensated…
Learning and compression are driven by the common aim of identifying and exploiting statistical regularities in data, which opens the door for fertile collaboration between these areas. A promising group of compression techniques for…
The prohibitive sizes of Large Language Models (LLMs) today make it difficult to deploy them on memory-constrained edge devices. This work introduces $\rm CALDERA$ -- a new post-training LLM compression algorithm that harnesses the inherent…
The minimum weight matching (MWM) and maximum likelihood decoding (MLD) are two widely used and distinct decoding strategies for quantum error correction. For a given syndrome, the MWM decoder finds the most probable physical error…
Near-field propagation in extremely large aperture arrays requires joint angle-range estimation. In hybrid architectures, only $N_\mathrm{RF}\ll M$ compressed snapshots are available per slot, making the $N_\mathrm{RF}\times N_\mathrm{RF}$…
We demonstrate the use of a variational method to determine a quantitative lower bound on the rate of convergence of Markov Chain Monte Carlo (MCMC) algorithms as a function of the target density and proposal density. The bound relies on…
Many common types of data can be represented as functions that map coordinates to signal values, such as pixel locations to RGB values in the case of an image. Based on this view, data can be compressed by overfitting a compact neural…
The robustness of the kernel recursive least square (KRLS) algorithm has recently been improved by combining them with more robust information-theoretic learning criteria, such as minimum error entropy (MEE) and generalized MEE (GMEE),…
We introduce a new neural network-based continual learning algorithm, dubbed as Uncertainty-regularized Continual Learning (UCL), which builds on traditional Bayesian online learning framework with variational inference. We focus on two…
Supervised learning is often affected by a covariate shift in which the marginal distributions of instances (covariates $x$) of training and testing samples $\mathrm{p}_\text{tr}(x)$ and $\mathrm{p}_\text{te}(x)$ are different but the label…
Compressing large-scale neural networks is essential for deploying models on resource-constrained devices. Most existing methods adopt weight pruning or low-bit quantization individually, often resulting in suboptimal compression rates to…
LLM training is resource-intensive. Quantized training improves computational and memory efficiency but introduces quantization noise, which can hinder convergence and degrade model accuracy. Stochastic Rounding (SR) has emerged as a…
Variational Bayesian Inference is a popular methodology for approximating posterior distributions over Bayesian neural network weights. Recent work developing this class of methods has explored ever richer parameterizations of the…
The principle of Maximal Coding Rate Reduction (MCR$^2$) has recently been proposed as a training objective for learning discriminative low-dimensional structures intrinsic to high-dimensional data to allow for more robust training than…
We propose a novel method to learn intractable distributions from their samples. The main idea is to use a parametric distribution model, such as a Gaussian Mixture Model (GMM), to approximate intractable distributions by minimizing the…
This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range…
Although quantum machine learning has shown great promise, the practical application of quantum computers remains constrained in the noisy intermediate-scale quantum era. To take advantage of quantum machine learning, we investigate the…