Related papers: Simplicity Scales
In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…
Many common document formats on the Internet are text-only such as email (MIME) and the Web (HTML, JavaScript, JSON and XML). To include images or executable code in these documents, we first encode them as text using base64. Standard…
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a…
In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art…
The transmission or storage of signals typically involves data compression. The final processing step in compression systems is generally an entropy coding stage, which converts symbols into a bit stream based on their probability…
The technique considers a message as binary string on which a Efficient Cryptographic Protocol using Recursive Bitwise amd pairs of Bits of operation (RBPBO) is performed. A block of n bits is taken as an input stream, where n varies from 4…
We often represent text using Unicode formats (UTF-8 and UTF-16). The UTF-8 format is increasingly popular, especially on the web (XML, HTML, JSON, Rust, Go, Swift, Ruby). The UTF-16 format is most common in Java, .NET, and inside operating…
Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the…
With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires…
This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…
Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes…
An ultra-high throughput low-density parity check (LDPC) decoder with an unrolled full-parallel architecture is proposed, which achieves the highest decoding throughput compared to previously reported LDPC decoders in the literature. The…
Cost-effective embedded systems necessitate utilizing the single-wire communication protocol for inter-chip communication, thanks to its reduced pin count in comparison to the multi-wire I2C or SPI protocols. However, current single-wire…
Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing…
Video compression systems must support increasing bandwidth and data throughput at low cost and power, and can be limited by entropy coding bottlenecks. Efficiency can be greatly improved by parallelizing coding, which can be done at much…
Fully Encrypted Protocols (FEPs) have arisen in practice as a technique to avoid network censorship. Such protocols are designed to produce messages that appear completely random. This design hides communications metadata, such as version…
Due to the large data volume and number of distinct elements, space is often the bottleneck of many stream processing systems. The data structures used by these systems often consist of counters whose optimization yields significant memory…
In the multicore era, the time to computational results is increasingly determined by how quickly operands are accessed by cores, rather than by the speed of computation per operand. From high-performance computing (HPC) to mobile…
A prescription to calculate the minimum number of bits needed for binary strip detector readout is presented. This permits a systematic analysis of the readout efficiency relative to this theoretical minimum number of bits. Different level…
We describe a quantum cryptography protocol with up to twenty four-dimensional ($\mathcal{D} =4$) states generated by a polarization-, phase- and time-encoding transmitter. This protocol can be experimentally realized with existing…