English
Related papers

Related papers: Simplicity Scales

200 papers

In many important applications -- such as search engines and relational database systems -- data is stored in the form of arrays of integers. Encoding and, most importantly, decoding of these arrays consumes considerable CPU time.…

Information Retrieval · Computer Science 2021-02-02 Daniel Lemire , Leonid Boytsov

Many common document formats on the Internet are text-only such as email (MIME) and the Web (HTML, JavaScript, JSON and XML). To include images or executable code in these documents, we first encode them as text using base64. Standard…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-08 Wojciech Muła , Daniel Lemire

We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a…

Information Retrieval · Computer Science 2017-01-17 Jeff Plaisance , Nathan Kurz , Daniel Lemire

In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-08-16 Daniel Lemire , Wojciech Muła

The transmission or storage of signals typically involves data compression. The final processing step in compression systems is generally an entropy coding stage, which converts symbols into a bit stream based on their probability…

Information Theory · Computer Science 2026-01-13 Tilo Strutz , Roman Rischke

The technique considers a message as binary string on which a Efficient Cryptographic Protocol using Recursive Bitwise amd pairs of Bits of operation (RBPBO) is performed. A block of n bits is taken as an input stream, where n varies from 4…

Cryptography and Security · Computer Science 2012-12-17 P. K. Jha , J. K. Mandal

We often represent text using Unicode formats (UTF-8 and UTF-16). The UTF-8 format is increasingly popular, especially on the web (XML, HTML, JSON, Rust, Go, Swift, Ruby). The UTF-16 format is most common in Java, .NET, and inside operating…

Programming Languages · Computer Science 2023-05-23 Daniel Lemire

Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the…

Mathematical Software · Computer Science 2026-04-07 Wojciech Muła , Daniel Lemire

With disks and networks providing gigabytes per second, parsing decimal numbers from strings becomes a bottleneck. We consider the problem of parsing decimal numbers to the nearest binary floating-point value. The general problem requires…

Data Structures and Algorithms · Computer Science 2022-11-07 Daniel Lemire

This thesis concerns sequential-access data compression, i.e., by algorithms that read the input one or more times from beginning to end. In one chapter we consider adaptive prefix coding, for which we must read the input character by…

Information Theory · Computer Science 2009-02-03 Travis Gagie

Many NLP models operate over sequences of subword tokens produced by hand-crafted tokenization rules and heuristic subword induction algorithms. A simple universal alternative is to represent every computerized text as a sequence of bytes…

Computation and Language · Computer Science 2021-04-13 Uri Shaham , Omer Levy

An ultra-high throughput low-density parity check (LDPC) decoder with an unrolled full-parallel architecture is proposed, which achieves the highest decoding throughput compared to previously reported LDPC decoders in the literature. The…

Cost-effective embedded systems necessitate utilizing the single-wire communication protocol for inter-chip communication, thanks to its reduced pin count in comparison to the multi-wire I2C or SPI protocols. However, current single-wire…

Hardware Architecture · Computer Science 2025-09-03 Bochen Ye , Gustavo Naspolini , Kimmo Salo , Manil Dev Gomony

Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing…

Databases · Computer Science 2020-04-16 Elias Stehle , Hans-Arno Jacobsen

Video compression systems must support increasing bandwidth and data throughput at low cost and power, and can be limited by entropy coding bottlenecks. Efficiency can be greatly improved by parallelizing coding, which can be done at much…

Image and Video Processing · Electrical Eng. & Systems 2023-12-05 Amir Said , Hoang Le , Farzad Farhadzadeh

Fully Encrypted Protocols (FEPs) have arisen in practice as a technique to avoid network censorship. Such protocols are designed to produce messages that appear completely random. This design hides communications metadata, such as version…

Cryptography and Security · Computer Science 2024-09-09 Ellis Fenske , Aaron Johnson

Due to the large data volume and number of distinct elements, space is often the bottleneck of many stream processing systems. The data structures used by these systems often consist of counters whose optimization yields significant memory…

Networking and Internet Architecture · Computer Science 2025-02-21 Ran Ben Basat , Gil Einziger , Bilal Tyah , Shay Vargaftik

In the multicore era, the time to computational results is increasingly determined by how quickly operands are accessed by cores, rather than by the speed of computation per operand. From high-performance computing (HPC) to mobile…

Other Computer Science · Computer Science 2013-03-21 Albert Wegener

A prescription to calculate the minimum number of bits needed for binary strip detector readout is presented. This permits a systematic analysis of the readout efficiency relative to this theoretical minimum number of bits. Different level…

Instrumentation and Detectors · Physics 2015-06-17 Maurice Garcia-Sciveres , Xinkang Wang

We describe a quantum cryptography protocol with up to twenty four-dimensional ($\mathcal{D} =4$) states generated by a polarization-, phase- and time-encoding transmitter. This protocol can be experimentally realized with existing…

Quantum Physics · Physics 2010-05-05 W. T. Buttler , S. K. Lamoreaux , J. R. Torgerson
‹ Prev 1 2 3 10 Next ›