Related papers: A New Technique for Text Data Compression
This paper describes a new method of data encoding which may be used in various modern digital, computer and telecommunication systems and devices. The method permits the compression of data for storage or transmission, allowing the exact…
Data compression has been widely applied in many data processing areas. Compression methods use variable-size codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. Fibonacci coding, as a…
This article describes a technique of using a trigonometric function and combinatorial calculations to code or transform any finite sequence of binary numbers (0s and 1s) of any length to a unique set of three Real numbers. In reverse,…
Dimension reduction and data quantization are two important methods for reducing data complexity. In the paper, we study the methodology of first reducing data dimension by random projection and then quantizing the projections to ternary or…
The paper introduces a new technique for compressing Binary Decision Diagrams in those cases where random access is not required. Using this technique, compression and decompression can be done in linear time in the size of the BDD and…
This paper presents a novel technique for embedding textual data into images using quinary combinations of pixel intensities in RGB space. Existing methods predominantly rely on least and most significant bit (LSB & MSB) manipulation, Pixel…
This paper proposes a novel entropy encoding technique for lossless data compression. Representing a message string by its lexicographic index in the permutations of its symbols results in a compressed version matching Shannon entropy of…
Mathematically, ternary coding is more efficient than binary coding. It is little used in computation because technology for binary processing is already established and the implementation of ternary coding is more complicated, but remains…
In this work, we propose extreme compression techniques like binarization, ternarization for Neural Decoders such as TurboAE. These methods reduce memory and computation by a factor of 64 with a performance better than the quantized (with…
This article shows that any type of binary data can be defined as a collection from codewords of variable length. This feature helps us to define an Injective and surjective function from the suggested codewords to the required codewords.…
The paper presents a binarization scheme that converts non-binary data into a set of binary strings. At present, there are many binarization algorithms, but they are optimal for only specific probability distributions of the data source.…
A compression algorithm is presented that uses the set of prime numbers. Sequences of numbers are correlated with the prime numbers, and labeled with the integers. The algorithm can be iterated on data sets, generating factors of doubles on…
In this paper we examine a number of term rewriting system for integer number representations, building further upon the datatype defining systems described in [2]. In particular, we look at automated methods for proving confluence and…
In this report, a new fuzzy 2bit-AND parallel-to-OR, or simply, a fuzzy binary AND/OR (FBAR) text data compression model as an algorithm is suggested for bettering spatial locality limits on nodes during database transactions. The current…
While existing speech audio codecs designed for compression exploit limited forms of temporal redundancy and allow for multi-scale representations, they tend to represent all features of audio in the same way. In contrast, generative voice…
In language processing, transformers benefit greatly from text being condensed. This is achieved through a larger vocabulary that captures word fragments instead of plain characters. This is often done with Byte Pair Encoding. In the…
Neural networks using numerous text data have been successfully applied to a variety of tasks. While massive text data is usually compressed using techniques such as grammar compression, almost all of the previous machine learning methods…
The binary string matching problem consists in finding all the occurrences of a pattern in a text where both strings are built on a binary alphabet. This is an interesting problem in computer science, since binary data are omnipresent in…
A new run length encoding algorithm for lossless data compression that exploits positional redundancy by representing data in a two-dimensional model of concentric circles is presented. This visual transform enables detection of runs (each…
Data compression is very important feature in terms of saving the memory space. In this proposal, an indexed dictionary based compression is used for text data, where the word's reference in dictionary is used for compression. This approach…