Related papers: G-TADOC: Enabling Efficient GPU-Based Text Analyti…

TADOC: Text Analytics Directly on Compression

This article provides a comprehensive description of Text Analytics Directly on Compression (TADOC), which enables direct document analytics on compressed textual data. The article explains the concept of TADOC and the challenges to its…

Data Structures and Algorithms · Computer Science 2020-09-22 Feng Zhang , Jidong Zhai , Xipeng Shen , Dalin Wang , Zheng Chen , Onur Mutlu , Wenguang Chen , Xiaoyong Du

CODAG: Characterizing and Optimizing Decompression Algorithms for GPUs

Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Jeongmin Park , Zaid Qureshi , Vikram Mailthody , Andrew Gacek , Shunfan Shao , Mohammad AlMasri , Isaac Gelado , Jinjun Xiong , Chris Newburn , I-hsin Chung , Michael Garland , Nikolay Sakharnykh , Wen-mei Hwu

GATO: GPU-Accelerated and Batched Trajectory Optimization for Scalable Edge Model Predictive Control

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing…

Robotics · Computer Science 2026-05-11 Alexander Du , Emre Adabag , Gabriel Bravo-Palacios , Brian Plancher

GCoD: Graph Convolutional Network Acceleration via Dedicated Algorithm and Accelerator Co-Design

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and…

Hardware Architecture · Computer Science 2025-03-11 Haoran You , Tong Geng , Yongan Zhang , Ang Li , Yingyan Celine Lin

GTX: A Write-Optimized Latch-free Graph Data System with Transactional Support -- Extended Version

This paper introduces GTX, a standalone main-memory write-optimized graph data system that specializes in structural and graph property updates while enabling concurrent reads and graph analytics through ACID transactions. Recent graph…

Databases · Computer Science 2025-02-25 Libin Zhou , Lu Xing , Yeasir Rayhan , Walid. G. Aref

GPU Acceleration of SQL Analytics on Compressed Data

GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain…

Databases · Computer Science 2025-09-05 Zezhou Huang , Krystian Sakowski , Hans Lehnert , Wei Cui , Carlo Curino , Matteo Interlandi , Marius Dumitru , Rathijit Sen

Compilation Techniques for Graph Algorithms on GPUs

The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-11 Ajay Brahmakshatriya , Yunming Zhang , Changwan Hong , Shoaib Kamil , Julian Shun , Saman Amarasinghe

gMatch: Fine-Grained and Hardware-Efficient Subgraph Matching on GPUs

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…

Databases · Computer Science 2026-04-14 Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo

Vortex: Overcoming Memory Capacity Limitations in GPU-Accelerated Large-Scale Data Analytics

Despite the high computational throughput of GPUs, limited memory capacity and bandwidth-limited CPU-GPU communication via PCIe links remain significant bottlenecks for accelerating large-scale data analytics workloads. This paper…

Databases · Computer Science 2025-02-14 Yichao Yuan , Advait Iyer , Lin Ma , Nishil Talati

Data Path Fusion in GPU for Analytical Query Processing

One major technical challenge for modern analytical database systems is how to leverage GPU to exploit their massive parallelism and high bandwidth. Yet, existing GPU-driven database engines suffer from inefficiencies caused by frequent…

Databases · Computer Science 2026-05-12 Tsuyoshi Ozawa , Kazuo Goda

Technical Report: Accelerating Dynamic Graph Analytics on GPUs

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan

Gunrock: GPU Graph Analytics

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-06 Yangzihao Wang , Yuechao Pan , Andrew Davidson , Yuduo Wu , Carl Yang , Leyuan Wang , Muhammad Osama , Chenshan Yuan , Weitang Liu , Andy T. Riffel , John D. Owens

G-CoS: GNN-Accelerator Co-Search Towards Both Better Accuracy and Efficiency

Graph Neural Networks (GNNs) have emerged as the state-of-the-art (SOTA) method for graph-based learning tasks. However, it still remains prohibitively challenging to inference GNNs over large graph datasets, limiting their application to…

Hardware Architecture · Computer Science 2021-09-21 Yongan Zhang , Haoran You , Yonggan Fu , Tong Geng , Ang Li , Yingyan Lin

Tuple-oriented Compression for Large-scale Mini-batch Stochastic Gradient Descent

Data compression is a popular technique for improving the efficiency of data processing workloads such as SQL queries and more recently, machine learning (ML) with classical batch gradient methods. But the efficacy of such ideas for…

Machine Learning · Computer Science 2019-01-23 Fengan Li , Lingjiao Chen , Yijing Zeng , Arun Kumar , Jeffrey F. Naughton , Jignesh M. Patel , Xi Wu

A High-Throughput GPU Framework for Adaptive Lossless Compression of Floating-Point Data

The torrential influx of floating-point data from domains like IoT and HPC necessitates high-performance lossless compression to mitigate storage costs while preserving absolute data fidelity. Leveraging GPU parallelism for this task…

Databases · Computer Science 2025-11-12 Zheng Li , Weiyan Wang , Ruiyuan Li , Chao Chen , Xianlei Long , Linjiang Zheng , Quanqing Xu , Chuanhui Yang

ZipFlow: a Compiler-based Framework to Unleash Compressed Data Movement for Modern GPUs

In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth. Data compression has come to rescue for…

Databases · Computer Science 2026-02-10 Gwangoo Yeo , Zhiyang Shen , Wei Cui , Matteo Interlandi , Rathijit Sen , Bailu Ding , Qi Chen , Minsoo Rhu

cuFasterTucker: A Stochastic Optimization Strategy for Parallel Sparse FastTucker Decomposition on GPU Platform

Currently, the size of scientific data is growing at an unprecedented rate. Data in the form of tensors exhibit high-order, high-dimensional, and highly sparse features. Although tensor-based analysis methods are very effective, the large…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-13 Zixuan Li

CuLDA_CGS: Solving Large-scale LDA Problems on GPUs

Latent Dirichlet Allocation(LDA) is a popular topic model. Given the fact that the input corpus of LDA algorithms consists of millions to billions of tokens, the LDA training process is very time-consuming, which may prevent the usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-14 Xiaolong Xie , Yun Liang , Xiuhong Li , Wei Tan

TAGC: Optimizing Gradient Communication in Distributed Transformer Training

The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training. A significant bottleneck in this process is gradient…

Machine Learning · Computer Science 2025-04-09 Igor Polyakov , Alexey Dukhanov , Egor Spirin

FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-14 Lilit Grigoryan , Vladimir Bataev , Nikolay Karpov , Andrei Andrusenko , Vitaly Lavrukhin , Boris Ginsburg