Related papers: Sequential Universal Modeling for Non-Binary Seque…
We propose and study a family of universal sequential probability assignments on individual sequences, based on the incremental parsing procedure of the Lempel-Ziv (LZ78) compression algorithm. We show that the normalized log loss under any…
Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are…
We consider nonparametric or universal sequential hypothesis testing problem when the distribution under the null hypothesis is fully known but the alternate hypothesis corresponds to some other unknown distribution. These algorithms are…
Consider the case where consecutive blocks of N letters of a semi-infinite individual sequence X over a finite-alphabet are being compressed into binary sequences by some one-to-one mapping. No a-priori information about X is available at…
Universal compression of patterns of sequences generated by independently identically distributed (i.i.d.) sources with unknown, possibly large, alphabets is investigated. A pattern is a sequence of indices that contains all consecutive…
We address the problem of nonparametric estimation of characteristics for stationary and ergodic time series. We consider finite-alphabet time series and real-valued ones and the following four problems: i) estimation of the (limiting)…
In this paper an approach to modelling nonstationary binary sequences, i.e., predicting the probability of upcoming symbols, is presented. After studying the prediction model we evaluate its performance in two non-artificial test cases.…
Universal compression algorithms have been studied in the past for sequential change detection, where they have been used to estimate the post-change distribution in the modified version of the Cumulative Sum (CUSUM) Test. In this paper, we…
For a collection of distributions over a countable support set, the worst case universal compression formulation by Shtarkov attempts to assign a universal distribution over the support set. The formulation aims to ensure that the universal…
Motivated by the prevalent data science applications of processing large-scale graph data such as social networks and biological networks, this paper investigates lossless compression of data in the form of a labeled graph. Particularly, we…
Solomonoff's uncomputable universal prediction scheme $\xi$ allows to predict the next symbol $x_k$ of a sequence $x_1...x_{k-1}$ for any Turing computable, but otherwise unknown, probabilistic environment $\mu$. This scheme will be…
This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…
In this paper, we propose a unified compression algorithm for distributed nonconvex opitmization with both the locally- and globally-bounded communication compressors, including 1-bit compressors, saturating quantizers, and the…
We study sequential prediction of real-valued, arbitrary and unknown sequences under the squared error loss as well as the best parametric predictor out of a large, continuous class of predictors. Inspired by recent results from…
Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher…
We study sequential probability assignment in the Gaussian setting, where the goal is to predict, or equivalently compress, a sequence of real-valued observations almost as well as the best Gaussian distribution with mean constrained to a…
We study universal compression of sequences generated by monotonic distributions. We show that for a monotonic distribution over an alphabet of size $k$, each probability parameter costs essentially $0.5 \log (n/k^3)$ bits, where $n$ is the…
In the field of biological research, it is essential to comprehend the characteristics and functions of molecular sequences. The classification of molecular sequences has seen widespread use of neural network-based techniques. Despite their…
This paper considers distributed nonconvex optimization with the cost functions being distributed over agents. Noting that information compression is a key tool to reduce the heavy communication load for distributed algorithms as agents…
The paper introduces a new lossless, highly robust compression algorithm that similar with LZW algorithm, yet the algorithm discards dictionary processing and uses irregular sequences with massive, random information instead. Then the paper…