Related papers: Bilateral Distribution Compression: Reducing Both …

Towards Bayesian Data Compression

In order to handle large data sets omnipresent in modern science, efficient compression algorithms are necessary. Here, a Bayesian data compression (BDC) algorithm that adapts to the specific measurement situation is derived in the context…

Data Analysis, Statistics and Probability · Physics 2021-03-01 Johannes Harth-Kitzerow , Reimar Leike , Philipp Arras , Torsten A. Enßlin

Distributed Bayesian Matrix Decomposition for Big Data Mining and Clustering

Matrix decomposition is one of the fundamental tools to discover knowledge from big data generated by modern applications. However, it is still inefficient or infeasible to process very big data using such a method in a single machine.…

Machine Learning · Computer Science 2020-02-11 Chihao Zhang , Yang Yang , Wei Zhang , Shihua Zhang

Decomposed Distribution Matching in Dataset Condensation

Dataset Condensation (DC) aims to reduce deep neural networks training efforts by synthesizing a small dataset such that it will be as effective as the original large dataset. Conventionally, DC relies on a costly bi-level optimization…

Computer Vision and Pattern Recognition · Computer Science 2024-12-09 Sahar Rahimi Malakshan , Mohammad Saeed Ebrahimi Saadabadi , Ali Dabouei , Nasser M. Nasrabadi

Multisize Dataset Condensation

While dataset condensation effectively enhances training efficiency, its application in on-device scenarios brings unique challenges. 1) Due to the fluctuating computational resources of these devices, there's a demand for a flexible…

Computer Vision and Pattern Recognition · Computer Science 2024-04-16 Yang He , Lingao Xiao , Joey Tianyi Zhou , Ivor Tsang

Preserved central model for faster bidirectional compression in distributed settings

We develop a new approach to tackle communication constraints in a distributed learning problem with a central server. We propose and analyze a new algorithm that performs bidirectional compression and achieves the same convergence rate as…

Machine Learning · Computer Science 2022-06-17 Constantin Philippenko , Aymeric Dieuleveut

Compressing Binary Decision Diagrams

The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and…

Artificial Intelligence · Computer Science 2008-12-18 Esben Rune Hansen , S. Srinivasa Rao , Peter Tiedemann

A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm

Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational…

Information Theory · Computer Science 2023-07-19 Nikhil Krishnan , Dror Baron

Bayesian Compressed Regression

As an alternative to variable selection or shrinkage in high dimensional regression, we propose to randomly compress the predictors prior to analysis. This dramatically reduces storage and computational bottlenecks, performing well when the…

Machine Learning · Statistics 2013-03-26 Rajarshi Guhaniyogi , David B. Dunson

Dataset Condensation with Latent Quantile Matching

Dataset condensation (DC) methods aim to learn a smaller synthesized dataset with informative data records to accelerate the training of machine learning models. Current distribution matching (DM) based DC methods learn a synthesized…

Machine Learning · Computer Science 2024-06-17 Wei Wei , Tom De Schepper , Kevin Mets

Multimodal Distribution Matching for Vision-Language Dataset Distillation

Dataset distillation compresses large training sets into compact synthetic datasets while preserving downstream performance. As modern systems increasingly operate on paired vision-language inputs, multimodal distillation must preserve…

Computer Vision and Pattern Recognition · Computer Science 2026-05-25 Jongoh Jeong , Hoyong Kwon , Minseok Kim , Kuk-Jin Yoon

Improved Distribution Matching for Dataset Condensation

Dataset Condensation aims to condense a large dataset into a smaller one while maintaining its ability to train a well-performing model, thus reducing the storage cost and training effort in deep learning applications. However, conventional…

Machine Learning · Computer Science 2023-07-20 Ganlong Zhao , Guanbin Li , Yipeng Qin , Yizhou Yu

M3D: Dataset Condensation by Minimizing Maximum Mean Discrepancy

Training state-of-the-art (SOTA) deep models often requires extensive data, resulting in substantial training and storage costs. To address these challenges, dataset condensation has been developed to learn a small synthetic set that…

Computer Vision and Pattern Recognition · Computer Science 2024-02-27 Hansong Zhang , Shikun Li , Pengju Wang , Dan Zeng , Shiming Ge

ImagebindDC: Compressing Multi-modal Data with Imagebind-based Condensation

Data condensation techniques aim to synthesize a compact dataset from a larger one to enable efficient model training, yet while successful in unimodal settings, they often fail in multimodal scenarios where preserving intricate inter-modal…

Computer Vision and Pattern Recognition · Computer Science 2025-11-12 Yue Min , Shaobo Wang , Jiaze Li , Tianle Niu , Junxin Fan , Yongliang Miao , Lijin Yang , Linfeng Zhang

Robust Multiple Description Neural Video Codec with Masked Transformer for Dynamic and Noisy Networks

Multiple Description Coding (MDC) is a promising error-resilient source coding method that is particularly suitable for dynamic networks with multiple (yet noisy and unreliable) paths. However, conventional MDC video codecs suffer from…

Computer Vision and Pattern Recognition · Computer Science 2024-12-12 Xinyue Hu , Wei Ye , Jiaxiang Tang , Eman Ramadan , Zhi-Li Zhang

Distributed Image Compression with Multimodal Side Information at Extremely Low Bitrates

Distributed Image Compression (DIC) is crucial for multi-view transmission, especially when operating at extremely low bitrates (< 0.1 bpp). Its core challenge is effectively utilizing side information to achieve high-quality reconstruction…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Guojun Xu , Mingyang Zhang , Jianwen Xiang , Cheng Tan , Yanchao Yang , Junwei Zhou

Towards Principled Dataset Distillation: A Spectral Distribution Perspective

Dataset distillation (DD) aims to compress large-scale datasets into compact synthetic counterparts for efficient model training. However, existing DD methods exhibit substantial performance degradation on long-tailed datasets. We identify…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Ruixi Wu , Shaobo Wang , Jiahuan Chen , Zhiyuan Liu , Yicun Yang , Zhaorun Chen , Zekai Li , Kaixin Li , Xinming Wang , Hongzhu Yi , Kai Wang , Linfeng Zhang

Bi-Directional Deep Contextual Video Compression

Deep video compression has made remarkable process in recent years, with the majority of advancements concentrated on P-frame coding. Although efforts to enhance B-frame coding are ongoing, their compression performance is still far behind…

Computer Vision and Pattern Recognition · Computer Science 2025-01-22 Xihua Sheng , Li Li , Dong Liu , Shiqi Wang

Implicit Neural Multiple Description for DNA-based data storage

DNA exhibits remarkable potential as a data storage solution due to its impressive storage density and long-term stability, stemming from its inherent biomolecular structure. However, developing this novel medium comes with its own set of…

Image and Video Processing · Electrical Eng. & Systems 2023-09-14 Trung Hieu Le , Xavier Pic , Jeremy Mateos , Marc Antonini

Scalable Bayesian Clustering for Integrative Analysis of Multi-View Data

In the era of Big Data, scalable and accurate clustering algorithms for high-dimensional data are essential. We present new Bayesian Distance Clustering (BDC) models and inference algorithms with improved scalability while maintaining the…

Methodology · Statistics 2024-09-02 Rafael Cabral , Maria de Iorio , Andrew Harris

Less is More: Efficient Time Series Dataset Condensation via Two-fold Modal Matching--Extended Version

The expanding instrumentation of processes throughout society with sensors yields a proliferation of time series data that may in turn enable important applications, e.g., related to transportation infrastructures or power grids.…

Databases · Computer Science 2024-10-29 Hao Miao , Ziqiao Liu , Yan Zhao , Chenjuan Guo , Bin Yang , Kai Zheng , Christian S. Jensen