Related papers: Compressing Tabular Data via Latent Variable Estim…

Few-Shot Non-Parametric Learning with Deep Latent Variable Model

Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric…

Machine Learning · Computer Science 2022-09-20 Zhiying Jiang , Yiqin Dai , Ji Xin , Ming Li , Jimmy Lin

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

We formulate the problem of performing optimal data compression under the constraints that compressed data can be used for accurate classification in machine learning. We show that this translates to a problem of minimizing the mutual…

Signal Processing · Electrical Eng. & Systems 2022-11-04 Jingchao Gao , Ao Tang , Weiyu Xu

Improving Matrix-vector Multiplication via Lossless Grammar-Compressed Matrices

As nowadays Machine Learning (ML) techniques are generating huge data collections, the problem of how to efficiently engineer their storage and operations is becoming of paramount importance. In this article we propose a new lossless…

Data Structures and Algorithms · Computer Science 2022-03-31 Paolo Ferragina , Travis Gagie , Dominik Köppl , Giovanni Manzini , Gonzalo Navarro , Manuel Striani , Francesco Tosoni

Weighted Tensor Decomposition for Learning Latent Variables with Partial Data

Tensor decomposition methods are popular tools for learning latent variables given only lower-order moments of the data. However, the standard assumption is that we have sufficient data to estimate these moments to high accuracy. In this…

Machine Learning · Statistics 2019-03-13 Omer Gottesman , Weiwei Pan , Finale Doshi-Velez

Lossy Compression for Lossless Prediction

Most data is automatically collected and only ever "seen" by algorithms. Yet, data compressors preserve perceptual fidelity rather than just the information needed by algorithms performing downstream tasks. In this paper, we characterize…

Machine Learning · Computer Science 2022-01-31 Yann Dubois , Benjamin Bloem-Reddy , Karen Ullrich , Chris J. Maddison

Low-rank Characteristic Tensor Density Estimation Part II: Compression and Latent Density Estimation

Learning generative probabilistic models is a core problem in machine learning, which presents significant challenges due to the curse of dimensionality. This paper proposes a joint dimensionality reduction and non-parametric density…

Machine Learning · Statistics 2022-06-22 Magda Amiridi , Nikos Kargas , Nicholas D. Sidiropoulos

Black-Box Statistical Prediction of Lossy Compression Ratios for Scientific Data

Lossy compressors are increasingly adopted in scientific research, tackling volumes of data from experiments or parallel numerical simulations and facilitating data storage and movement. In contrast with the notion of entropy in lossless…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-16 Robert Underwood , Julie Bessac , David Krasowska , Jon C. Calhoun , Sheng Di , Franck Cappello

Distributed Compressive Sensing: A Deep Learning Approach

Various studies that address the compressed sensing problem with Multiple Measurement Vectors (MMVs) have been recently carried. These studies assume the vectors of the different channels to be jointly sparse. In this paper, we relax this…

Machine Learning · Computer Science 2016-11-14 Hamid Palangi , Rabab Ward , Li Deng

Statistical and computational rates in high rank tensor estimation

Higher-order tensor datasets arise commonly in recommendation systems, neuroimaging, and social networks. Here we develop probable methods for estimating a possibly high rank signal tensor from noisy observations. We consider a generative…

Methodology · Statistics 2023-04-11 Chanwoo Lee , Miaoyan Wang

End-to-end optimized image compression with competition of prior distributions

Convolutional autoencoders are now at the forefront of image compression research. To improve their entropy coding, encoder output is typically analyzed with a second autoencoder to generate per-variable parametrized prior probability…

Image and Video Processing · Electrical Eng. & Systems 2021-11-18 Benoit Brummer , Christophe De Vleeschouwer

Asymptotic Accuracy of Distribution-Based Estimation for Latent Variables

Hierarchical statistical models are widely employed in information science and data engineering. The models consist of two types of variables: observable variables that represent the given data and latent variables for the unobservable…

Machine Learning · Statistics 2014-02-21 Keisuke Yamazaki

Selective compression learning of latent representations for variable-rate image compression

Recently, many neural network-based image compression methods have shown promising results superior to the existing tool-based conventional codecs. However, most of them are often trained as separate models for different target bit rates,…

Image and Video Processing · Electrical Eng. & Systems 2022-11-09 Jooyoung Lee , Seyoon Jeong , Munchurl Kim

Extractive Summary as Discrete Latent Variables

In this paper, we compare various methods to compress a text using a neural model. We find that extracting tokens as latent variables significantly outperforms the state-of-the-art discrete latent variable models such as VQ-VAE.…

Computation and Language · Computer Science 2019-01-28 Aran Komatsuzaki

Efficient Compression of Sparse Accelerator Data Using Implicit Neural Representations and Importance Sampling

High-energy, large-scale particle colliders in nuclear and high-energy physics generate data at extraordinary rates, reaching up to $1$ terabyte and several petabytes per second, respectively. The development of real-time, high-throughput…

Artificial Intelligence · Computer Science 2024-12-03 Xihaier Luo , Samuel Lurvey , Yi Huang , Yihui Ren , Jin Huang , Byung-Jun Yoon

Significant improvement of lossy compression rate and speed of HPC data using perceptron parallelized compression

The escalating surge in data generation presents formidable challenges to information technology, necessitating advancements in storage, retrieval, and utilization. With the proliferation of artificial intelligence and big data, the "Data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-19 Xinzhe Chen , Jianjiang Li

Partition and Code: learning how to compress graphs

Can we use machine learning to compress graph data? The absence of ordering in graphs poses a significant challenge to conventional compression algorithms, limiting their attainable gains as well as their ability to discover relevant…

Machine Learning · Computer Science 2023-09-26 Giorgos Bouritsas , Andreas Loukas , Nikolaos Karalias , Michael M. Bronstein

The Knowledge Within: Methods for Data-Free Model Compression

Recently, an extensive amount of research has been focused on compressing and accelerating Deep Neural Networks (DNN). So far, high compression rate algorithms require part of the training dataset for a low precision calibration, or a…

Machine Learning · Computer Science 2020-04-08 Matan Haroush , Itay Hubara , Elad Hoffer , Daniel Soudry

AlphaZip: Neural Network-Enhanced Lossless Text Compression

Data compression continues to evolve, with traditional information theory methods being widely used for compressing text, images, and videos. Recently, there has been growing interest in leveraging Generative AI for predictive compression…

Information Theory · Computer Science 2024-09-24 Swathi Shree Narashiman , Nitin Chandrachoodan

Predicting LLM Compression Degradation from Spectral Statistics

Matrix-level low-rank compression is a promising way to reduce the cost of large language models, but running compression and evaluating the resulting models on language tasks can be prohibitively expensive. Can compression-induced…

Machine Learning · Computer Science 2026-04-21 Mingxue Xu

Post-Processing of High-Dimensional Data

Scientific computations or measurements may result in huge volumes of data. Often these can be thought of representing a real-valued function on a high-dimensional domain, and can be conceptually arranged in the format of a tensor of high…

Numerical Analysis · Mathematics 2019-09-24 Mike Espig , Wolfgang Hackbusch , Alexander Litvinenko , Hermann G. Matthies , Elmar Zander