Related papers: Tiny, Hardware-Independent, Compression-based Clas…

Neural Normalized Compression Distance and the Disconnect Between Compression and Classification

It is generally well understood that predictive classification and compression are intrinsically related concepts in information theory. Indeed, many deep learning methods are explained as learning a kind of compression, and that better…

Machine Learning · Computer Science 2024-10-22 John Hurwitz , Charles Nicholas , Edward Raff

Generalized Compression Dictionary Distance as Universal Similarity Measure

We present a new similarity measure based on information theoretic measures which is superior than Normalized Compression Distance for clustering problems and inherits the useful properties of conditional Kolmogorov complexity. We show that…

Machine Learning · Statistics 2014-10-22 Andrey Bogomolov , Bruno Lepri , Fabio Pianesi

Constrained Clustering and Multiple Kernel Learning without Pairwise Constraint Relaxation

Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of…

Machine Learning · Computer Science 2022-03-24 Benedikt Boecking , Vincent Jeanselme , Artur Dubrawski

Normalized Information Distance

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to…

Information Retrieval · Computer Science 2008-09-16 Paul M. B. Vitanyi , Frank J. Balbach , Rudi L. Cilibrasi , Ming Li

A Universal Non-Parametric Approach For Improved Molecular Sequence Analysis

In the field of biological research, it is essential to comprehend the characteristics and functions of molecular sequences. The classification of molecular sequences has seen widespread use of neural network-based techniques. Despite their…

Machine Learning · Computer Science 2024-02-14 Sarwan Ali , Tamkanat E Ali , Prakash Chourasia , Murray Patterson

On Normalized Compression Distance and Large Malware

Normalized Compression Distance (NCD) is a popular tool that uses compression algorithms to cluster and classify data in a wide range of applications. Existing discussions of NCD's theoretical merit rely on certain theoretical properties of…

Cryptography and Security · Computer Science 2015-09-03 Rebecca Schuller Borbely

Normalized Compression Distance of Multisets with Applications

Normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity measure between a pair of finite objects based on compression. However, it is not sufficient for all applications. We propose an NCD of…

Computer Vision and Pattern Recognition · Computer Science 2016-01-28 Andrew R. Cohen , Paul M. B. Vitanyi

A compressive multi-kernel method for privacy-preserving machine learning

As the analytic tools become more powerful, and more data are generated on a daily basis, the issue of data privacy arises. This leads to the study of the design of privacy-preserving machine learning algorithms. Given two objectives,…

Machine Learning · Computer Science 2021-06-22 Thee Chanyaswad , J. Morris Chang , S. Y. Kung

Compression Boosts Differentially Private Federated Learning

Federated Learning allows distributed entities to train a common model collaboratively without sharing their own data. Although it prevents data collection and aggregation by exchanging only parameter updates, it remains vulnerable to…

Machine Learning · Computer Science 2020-11-12 Raouf Kerkouche , Gergely Ács , Claude Castelluccia , Pierre Genevès

Privacy-Aware Compression for Federated Data Analysis

Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed low-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework…

Machine Learning · Computer Science 2022-06-10 Kamalika Chaudhuri , Chuan Guo , Mike Rabbat

Adversarial Network Compression

Neural network compression has recently received much attention due to the computational requirements of modern deep models. In this work, our objective is to transfer knowledge from a deep and accurate model to a smaller one. Our…

Computer Vision and Pattern Recognition · Computer Science 2018-11-15 Vasileios Belagiannis , Azade Farshad , Fabio Galasso

Lossy Compression of Noisy Data for Private and Data-Efficient Learning

Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing…

Information Theory · Computer Science 2023-03-23 Berivan Isik , Tsachy Weissman

What Happens on the Edge, Stays on the Edge: Toward Compressive Deep Learning

Machine learning at the edge offers great benefits such as increased privacy and security, low latency, and more autonomy. However, a major challenge is that many devices, in particular edge devices, have very limited memory, weak…

Machine Learning · Computer Science 2019-09-05 Yang Li , Thomas Strohmer

Person Re-identification with Metric Learning using Privileged Information

Despite the promising progress made in recent years, person re-identification remains a challenging task due to complex variations in human appearances from different camera views. This paper presents a logistic discriminant metric learning…

Computer Vision and Pattern Recognition · Computer Science 2019-04-11 Xun Yang , Meng Wang , Dacheng Tao

Distributed Compression in the Era of Machine Learning: A Review of Recent Advances

Many applications from camera arrays to sensor networks require efficient compression and processing of correlated data, which in general is collected in a distributed fashion. While information-theoretic foundations of distributed…

Information Theory · Computer Science 2024-02-14 Ezgi Ozyilkan , Elza Erkip

Privacy-Preserved Big Data Analysis Based on Asymmetric Imputation Kernels and Multiside Similarities

This study presents an efficient approach for incomplete data classification, where the entries of samples are missing or masked due to privacy preservation. To deal with these incomplete data, a new kernel function with asymmetric…

Machine Learning · Computer Science 2016-11-22 Bo-Wei Chen

Alignment-free comparison of next-generation sequencing data using compression-based distance measures

Enormous volumes of short reads data from next-generation sequencing (NGS) technologies have posed new challenges to the area of genomic sequence comparison. The multiple sequence alignment approach is hardly applicable to NGS data due to…

Genomics · Quantitative Biology 2020-03-25 Ngoc Hieu Tran , Xin Chen

Scalable Kernel-Based Distances for Statistical Inference and Integration

Representing, comparing, and measuring the distance between probability distributions is a key task in computational statistics and machine learning. The choice of representation and the associated distance determine properties of the…

Machine Learning · Statistics 2026-02-26 Masha Naslidnyk

Distance-based Analysis of Machine Learning Prediction Reliability for Datasets in Materials Science and Other Fields

Despite successful use in a wide variety of disciplines for data analysis and prediction, machine learning (ML) methods suffer from a lack of understanding of the reliability of predictions due to the lack of transparency and black-box…

Materials Science · Physics 2023-04-04 Evan Askanazi , Ilya Grinberg

Distance Shrinkage and Euclidean Embedding via Regularized Kernel Estimation

Although recovering an Euclidean distance matrix from noisy observations is a common problem in practice, how well this could be done remains largely unknown. To fill in this void, we study a simple distance matrix estimate based upon the…

Machine Learning · Statistics 2014-09-18 Luwan Zhang , Grace Wahba , Ming Yuan