Related papers: Tiny, Hardware-Independent, Compression-based Clas…
It is generally well understood that predictive classification and compression are intrinsically related concepts in information theory. Indeed, many deep learning methods are explained as learning a kind of compression, and that better…
We present a new similarity measure based on information theoretic measures which is superior than Normalized Compression Distance for clustering problems and inherits the useful properties of conditional Kolmogorov complexity. We show that…
Clustering under pairwise constraints is an important knowledge discovery tool that enables the learning of appropriate kernels or distance metrics to improve clustering performance. These pairwise constraints, which come in the form of…
The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to…
In the field of biological research, it is essential to comprehend the characteristics and functions of molecular sequences. The classification of molecular sequences has seen widespread use of neural network-based techniques. Despite their…
Normalized Compression Distance (NCD) is a popular tool that uses compression algorithms to cluster and classify data in a wide range of applications. Existing discussions of NCD's theoretical merit rely on certain theoretical properties of…
Normalized compression distance (NCD) is a parameter-free, feature-free, alignment-free, similarity measure between a pair of finite objects based on compression. However, it is not sufficient for all applications. We propose an NCD of…
As the analytic tools become more powerful, and more data are generated on a daily basis, the issue of data privacy arises. This leads to the study of the design of privacy-preserving machine learning algorithms. Given two objectives,…
Federated Learning allows distributed entities to train a common model collaboratively without sharing their own data. Although it prevents data collection and aggregation by exchanging only parameter updates, it remains vulnerable to…
Federated data analytics is a framework for distributed data analysis where a server compiles noisy responses from a group of distributed low-bandwidth user devices to estimate aggregate statistics. Two major challenges in this framework…
Neural network compression has recently received much attention due to the computational requirements of modern deep models. In this work, our objective is to transfer knowledge from a deep and accurate model to a smaller one. Our…
Storage-efficient privacy-preserving learning is crucial due to increasing amounts of sensitive user data required for modern learning tasks. We propose a framework for reducing the storage cost of user data while at the same time providing…
Machine learning at the edge offers great benefits such as increased privacy and security, low latency, and more autonomy. However, a major challenge is that many devices, in particular edge devices, have very limited memory, weak…
Despite the promising progress made in recent years, person re-identification remains a challenging task due to complex variations in human appearances from different camera views. This paper presents a logistic discriminant metric learning…
Many applications from camera arrays to sensor networks require efficient compression and processing of correlated data, which in general is collected in a distributed fashion. While information-theoretic foundations of distributed…
This study presents an efficient approach for incomplete data classification, where the entries of samples are missing or masked due to privacy preservation. To deal with these incomplete data, a new kernel function with asymmetric…
Enormous volumes of short reads data from next-generation sequencing (NGS) technologies have posed new challenges to the area of genomic sequence comparison. The multiple sequence alignment approach is hardly applicable to NGS data due to…
Representing, comparing, and measuring the distance between probability distributions is a key task in computational statistics and machine learning. The choice of representation and the associated distance determine properties of the…
Despite successful use in a wide variety of disciplines for data analysis and prediction, machine learning (ML) methods suffer from a lack of understanding of the reliability of predictions due to the lack of transparency and black-box…
Although recovering an Euclidean distance matrix from noisy observations is a common problem in practice, how well this could be done remains largely unknown. To fill in this void, we study a simple distance matrix estimate based upon the…