Related papers: Massively-Parallel Lossless Data Decompression

Parallel Data Compression Techniques

With endless amounts of data and very limited bandwidth, fast data compression is one solution for the growing datasharing problem. Compression helps lower transfer times and save memory, but if the compression takes too long, this no…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-21 David Noel , Elizabeth Graham , Liyuan Liu

Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism

Scaling models has led to significant advancements in deep learning, but training these models in decentralized settings remains challenging due to communication bottlenecks. While existing compression techniques are effective in…

Machine Learning · Computer Science 2025-06-03 Sameera Ramasinghe , Thalaiyasingam Ajanthan , Gil Avraham , Yan Zuo , Alexander Long

ZipFlow: a Compiler-based Framework to Unleash Compressed Data Movement for Modern GPUs

In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth. Data compression has come to rescue for…

Databases · Computer Science 2026-02-10 Gwangoo Yeo , Zhiyang Shen , Wei Cui , Matteo Interlandi , Rathijit Sen , Bailu Ding , Qi Chen , Minsoo Rhu

ParPaRaw: Massively Parallel Parsing of Delimiter-Separated Raw Data

Parsing is essential for a wide range of use cases, such as stream processing, bulk loading, and in-situ querying of raw data. Yet, the compute-intense step often constitutes a major bottleneck in the data ingestion pipeline, since parsing…

Databases · Computer Science 2020-04-16 Elias Stehle , Hans-Arno Jacobsen

Pass-efficient methods for compression of high-dimensional turbulent flow data

The future of high-performance computing, specifically on future Exascale computers, will presumably see memory capacity and bandwidth fail to keep pace with data generated, for instance, from massively parallel partial differential…

Computational Physics · Physics 2020-01-29 Alec M. Dunton , Lluís Jofre , Gianluca Iaccarino , Alireza Doostan

Accelerating Lossless Data Compression with GPUs

Huffman compression is a statistical, lossless, data compression algorithm that compresses data by assigning variable length codes to symbols, with the more frequently appearing symbols given shorter codes than the less. This work is a…

Information Theory · Computer Science 2011-07-11 R. L. Cloud , M. L. Curry , H. L. Ward , A. Skjellum , P. Bangalore

CStream: Parallel Data Stream Compression on Multicore Edge Devices

In the burgeoning realm of Internet of Things (IoT) applications on edge devices, data stream compression has become increasingly pertinent. The integration of added compression overhead and limited hardware resources on these devices calls…

Databases · Computer Science 2024-06-18 Xianzhi Zeng , Shuhao Zhang

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Jeongmin Park , Zaid Qureshi , Vikram Mailthody , Andrew Gacek , Shunfan Shao , Mohammad AlMasri , Isaac Gelado , Jinjun Xiong , Chris Newburn , I-hsin Chung , Michael Garland , Nikolay Sakharnykh , Wen-mei Hwu

Accelerating Parallel Write via Deeply Integrating Predictive Lossy Compression with HDF5

Lossy compression is one of the most efficient solutions to reduce storage overhead and improve I/O performance for HPC applications. However, existing parallel I/O libraries cannot fully utilize lossy compression to accelerate parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-30 Sian Jin , Dingwen Tao , Houjun Tang , Sheng Di , Suren Byna , Zarija Lukic , Franck Cappello

Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Autoregressive decoding of large language models (LLMs) is memory bandwidth bounded, resulting in high latency and significant wastes of the parallel processing power of modern accelerators. Existing methods for accelerating LLM decoding…

Machine Learning · Computer Science 2024-02-06 Yichao Fu , Peter Bailis , Ion Stoica , Hao Zhang

dParallel: Learnable Parallel Decoding for dLLMs

Diffusion large language models (dLLMs) have recently drawn considerable attention within the research community as a promising alternative to autoregressive generation, offering parallel token prediction and lower inference latency. Yet,…

Computation and Language · Computer Science 2025-10-01 Zigeng Chen , Gongfan Fang , Xinyin Ma , Ruonan Yu , Xinchao Wang

Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures

Today's high-performance computing (HPC) applications are producing vast volumes of data, which are challenging to store and transfer efficiently during the execution, such that data compression is becoming a critical technique to mitigate…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-02 Jiannan Tian , Cody Rivera , Sheng Di , Jieyang Chen , Xin Liang , Dingwen Tao , Franck Cappello

DeInfer: Efficient Parallel Inferencing for Decomposed Large Language Models

Existing works on large language model (LLM) decomposition mainly focus on improving performance on downstream tasks, but they ignore the poor parallel inference performance when trying to scale up the model size. To mitigate this important…

Computation and Language · Computer Science 2026-04-21 You-Liang Huang , Xinhao Huang , Chengxi Liao , Zeyi Wen

Theoretically and Practically Efficient Parallel Nucleus Decomposition

This paper studies the nucleus decomposition problem, which has been shown to be useful in finding dense substructures in graphs. We present a novel parallel algorithm that is efficient both in theory and in practice. Our algorithm achieves…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-12 Jessica Shi , Laxman Dhulipala , Julian Shun

Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training

Deploying deep learning (DL) models across multiple compute devices to train large and complex models continues to grow in importance because of the demand for faster and more frequent training. Data parallelism (DP) is the most widely used…

Machine Learning · Computer Science 2022-11-08 Saptadeep Pal , Eiman Ebrahimi , Arslan Zulfiqar , Yaosheng Fu , Victor Zhang , Szymon Migacz , David Nellans , Puneet Gupta

Accelerating Diffusion via Hybrid Data-Pipeline Parallelism Based on Conditional Guidance Scheduling

Diffusion models have achieved remarkable progress in high-fidelity image, video, and audio generation, yet inference remains computationally expensive. Nevertheless, current diffusion acceleration methods based on distributed parallelism…

Computer Vision and Pattern Recognition · Computer Science 2026-02-26 Euisoo Jung , Byunghyun Kim , Hyunjin Kim , Seonghye Cho , Jae-Gil Lee

Accelerating a fluvial incision and landscape evolution model with parallelism

Solving inverse problems and achieving statistical rigour in landscape evolution models requires running many model realizations. Parallel computation is necessary to achieve this in a reasonable time. However, no previous algorithm is…

Computational Engineering, Finance, and Science · Computer Science 2019-01-23 Richard Barnes

Efficient Learned Data Compression via Dual-Stream Feature Decoupling

While Learned Data Compression (LDC) has achieved superior compression ratios, balancing precise probability modeling with system efficiency remains challenging. Crucially, uniform single-stream architectures struggle to simultaneously…

Computation and Language · Computer Science 2026-04-09 Huidong Ma , Xinyan Shi , Hui Sun , Xiaofei Yue , Xiaoguang Liu , Gang Wang , Wentong Cai

GPU Acceleration of SQL Analytics on Compressed Data

GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain…

Databases · Computer Science 2025-09-05 Zezhou Huang , Krystian Sakowski , Hans Lehnert , Wei Cui , Carlo Curino , Matteo Interlandi , Marius Dumitru , Rathijit Sen

Large-Scale Linear Energy System Optimization: A Systematic Review on Parallelization Strategies via Decomposition

As renewable energy integration, sector coupling, and spatiotemporal detail increase, energy system optimization models grow in size and complexity, often pushing solvers to their performance limits. This systematic review explores…

Optimization and Control · Mathematics 2025-08-11 Lars Hadidi , Leonard Göke , Maximilian Hoffmann , Mario Klostermeier , Shima Sasanpour , Tim Varelmann , Vassilios Yfantis , Jochen Linßen , Detlef Stolten , Jann M. Weinand