Related papers: Decoding billions of integers per second through v…
Sorted lists of integers are commonly used in inverted indexes and database systems. They are often compressed in memory. We can use the SIMD instructions available in common processors to boost the speed of integer compression schemes. Our…
We consider the ubiquitous technique of VByte compression, which represents each integer as a variable length sequence of bytes. The low 7 bits of each byte encode a portion of the integer, and the high bit of each byte is reserved as a…
Compression algorithms are important for data oriented tasks, especially in the era of Big Data. Modern processors equipped with powerful SIMD instruction sets, provide us an opportunity for achieving better compression performance.…
Modern processors have instructions to process 16 bytes or more at once. These instructions are called SIMD, for single instruction, multiple data. Recent advances have leveraged SIMD instructions to accelerate parsing of common Internet…
Compression can sometimes improve performance by making more of the data available to the processors faster. We consider the compression of integer keys in a B+-tree index. For this purpose, systems such as IBM DB2 use variable-byte…
In current computer architectures, data movement (from die to network) is by far the most energy consuming part of an algorithm (10pJ/word on-die to 10,000pJ/word on the network). To increase memory locality at the hardware level and reduce…
The ubiquitous Variable-Byte encoding is one of the fastest compressed representation for integer sequences. However, its compression ratio is usually not competitive with other more sophisticated encoders, especially when the integers to…
Arrays of integers are often compressed in search engines. Though there are many ways to compress integers, we are interested in the popular byte-oriented integer compression techniques (e.g., VByte or Google's Varint-GB). They are…
Particle-In-Cell (PIC) codes are broadly applied to the kinetic simulation of plasmas, from laser-matter interaction to astrophysics. Their heavy simulation cost can be mitigated by using the Single Instruction Multiple Data (SIMD)…
Converting binary integers to variable-length decimal strings is a fundamental operation in computing. Conventional fast approaches rely on recursive division and small lookup tables. We propose a SIMD-based algorithm that leverages integer…
Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization…
Counting the number of ones in a binary stream is a common operation in database, information-retrieval, cryptographic and machine-learning applications. Most processors have dedicated instructions to count the number of ones in a word…
In software, text is often represented using Unicode formats (UTF-8 and UTF-16). We frequently have to convert text from one format to the other, a process called transcoding. Popular transcoding functions are slower than state-of-the-art…
Data compression has been widely applied in many data processing areas. Compression methods use variable-size codes with the shorter codes assigned to symbols or groups of symbols that appear in the data frequently. Fibonacci coding, as a…
Web developers use base64 formats to include images, fonts, sounds and other resources directly inside HTML, JavaScript, JSON and XML files. We estimate that billions of base64 messages are decoded every day. We are motivated to improve the…
Stencil computation is one of the most important kernels in various scientific and engineering applications. A variety of work has focused on vectorization and tiling techniques, aiming at exploiting the in-core data parallelism and data…
For years, SIMD/vector units have enhanced the capabilities of modern CPUs in High-Performance Computing (HPC) and mobile technology. Typical commercially-available SIMD units process up to 8 double-precision elements with one instruction.…
Single Instruction, Multiple Data (SIMD) vectorization is a major driver of performance in current architectures, and is mandatory for achieving good performance with codes that are limited by instruction throughput. We investigate the…
Deep learning implementations on CPUs (Central Processing Units) are gaining more traction. Enhanced AI capabilities on commodity x86 architectures are commercially appealing due to the reuse of existing hardware and virtualization ease. A…
The dominant data interchange formats encode integers using a variable number of bytes or represent floating-point numbers as variable-length UTF-8 strings. The decoder must inspect each byte for a continuation bit or parse each character…