Related papers: Lempel-Zip Complexity Reference
Kolmogorov complexity measures the algorithmic complexity of a finite binary string $\sigma$ in terms of the length of the shortest description $\sigma^*$ of $\sigma$. Traditionally, the length of a string is taken to measure the amount of…
Lempel-Ziv (LZ77 or, briefly, LZ) is one of the most effective and widely-used compressors for repetitive texts. However, the existing efficient methods computing the exact LZ parsing have to use linear or close to linear space to index the…
We propose a novel, lightweight supervised dictionary learning framework for text classification based on data compression and representation. This two-phase algorithm initially employs the Lempel-Ziv-Welch (LZW) algorithm to construct a…
The Kolmogorov complexity of x, denoted C(x), is the length of the shortest program that generates x. For such a simple definition, Kolmogorov complexity has a rich and deep theory, as well as applications to a wide variety of topics…
We introduce a method for analyzing the complexity of natural language processing tasks, and for predicting the difficulty new NLP tasks. Our complexity measures are derived from the Kolmogorov complexity of a class of automata --- {\it…
The LZ-End parsing [Kreft & Navarro, 2011] of an input string yields compression competitive with the popular Lempel-Ziv 77 scheme, but also allows for efficient random access. Kempa and Kosolobov showed that the parsing can be computed in…
We derive upper and lower bounds on the overall compression ratio of the 1978 Lempel-Ziv (LZ78) algorithm, applied independently to $k$-blocks of a finite individual sequence. Both bounds are given in terms of normalized empirical entropies…
The Sliding Window Lempel-Ziv (SWLZ) algorithm that makes use of recurrence times and match lengths has been studied from various perspectives in information theory literature. In this paper, we undertake a finer study of these quantities…
Link prediction in graphs is an important task in the fields of network science and machine learning. We investigate a flexible means of regularization for link prediction based on an approximation of the Kolmogorov complexity of graphs…
Kolmogorov complexity theory is used to tell what the algorithmic informational content of a string is. It is defined as the length of the shortest program that describes the string. We present a programming language that can be used to…
Random sequences attain the highest entropy rate. The estimation of entropy rate for an ergodic source can be done using the Lempel Ziv complexity measure yet, the exact entropy rate value is only reached in the infinite limit. We prove…
The notion of Kolmogorov complexity (=the minimal length of a program that generates some object) is often useful as a kind of language that allows us to reformulate some notions and therefore provide new intuition. In this survey we…
The Kolmogorov complexity of the word w is equal to the length of the shortest concatenation of program Z and its input x with which the word w is computed by the universal turing machine U. The question introduced in this paper is the…
We present an algorithm which computes the Lempel-Ziv factorization of a word $W$ of length $n$ on an alphabet $\Sigma$ of size $\sigma$ online in the following sense: it reads $W$ starting from the left, and, after reading each $r =…
Given a reference computer, Kolmogorov complexity is a well defined function on all binary strings. In the standard approach, however, only the asymptotic properties of such functions are considered because they do not depend on the…
Kolmogorov complexity of a finite binary word reflects both algorithmic structure and the empirical distribution of symbols appearing in the word. Words with symbol frequencies far from one half have smaller combinatorial richness and…
Relative Lempel-Ziv (RLZ) is a popular algorithm for compressing databases of genomes from individuals of the same species when fast random access is desired. With Kuruppu et al.'s (SPIRE 2010) original implementation, a reference genome is…
We raise the question of approximating the compressibility of a string with respect to a fixed compression scheme, in sublinear time. We study this question in detail for two popular lossless compression schemes: run-length encoding (RLE)…
Any positive word comprised of random sequence of tokens form a finite alphabet can be reduced (without change of length) using an appropriate size Braid group relationships. Surprisingly the Braid relations dramatically reduce the…
There is no single universally accepted definition of "Complexity". There are several perspectives on complexity and what constitutes complex behaviour or complex systems, as opposed to regular, predictable behaviour and simple systems. In…