English
Related papers

Related papers: G-TADOC: Enabling Efficient GPU-Based Text Analyti…

200 papers

This article provides a comprehensive description of Text Analytics Directly on Compression (TADOC), which enables direct document analytics on compressed textual data. The article explains the concept of TADOC and the challenges to its…

Data Structures and Algorithms · Computer Science 2020-09-22 Feng Zhang , Jidong Zhai , Xipeng Shen , Dalin Wang , Zheng Chen , Onur Mutlu , Wenguang Chen , Xiaoyong Du

Data compression and decompression have become vital components of big-data applications to manage the exponential growth in the amount of data collected and stored. Furthermore, big-data applications have increasingly adopted GPUs due to…

While Model Predictive Control (MPC) delivers strong performance across robotics applications, solving the underlying (batches of) nonlinear trajectory optimization (TO) problems online remains computationally demanding. Existing…

Robotics · Computer Science 2026-05-11 Alexander Du , Emre Adabag , Gabriel Bravo-Palacios , Brian Plancher

Graph Convolutional Networks (GCNs) have emerged as the state-of-the-art graph learning model. However, it can be notoriously challenging to inference GCNs over large graph datasets, limiting their application to large real-world graphs and…

Hardware Architecture · Computer Science 2025-03-11 Haoran You , Tong Geng , Yongan Zhang , Ang Li , Yingyan Celine Lin

This paper introduces GTX, a standalone main-memory write-optimized graph data system that specializes in structural and graph property updates while enabling concurrent reads and graph analytics through ACID transactions. Recent graph…

Databases · Computer Science 2025-02-25 Libin Zhou , Lu Xing , Yeasir Rayhan , Walid. G. Aref

GPUs are uniquely suited to accelerate (SQL) analytics workloads thanks to their massive compute parallelism and High Bandwidth Memory (HBM) -- when datasets fit in the GPU HBM, performance is unparalleled. Unfortunately, GPU HBMs remain…

The performance of graph programs depends highly on the algorithm, the size and structure of the input graphs, as well as the features of the underlying hardware. No single set of optimizations or one hardware platform works well across all…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-01-11 Ajay Brahmakshatriya , Yunming Zhang , Changwan Hong , Shoaib Kamil , Julian Shun , Saman Amarasinghe

Subgraph matching is a core operation in graph analytics, supporting a broad spectrum of applications from social network analysis to bioinformatics. Recent GPU-based approaches accelerate subgraph matching by leveraging parallelism but…

Databases · Computer Science 2026-04-14 Weitian Chen , Shixuan Sun , Cheng Chen , Yongmin Hu , Yingqian Hu , Minyi Guo

Despite the high computational throughput of GPUs, limited memory capacity and bandwidth-limited CPU-GPU communication via PCIe links remain significant bottlenecks for accelerating large-scale data analytics workloads. This paper…

Databases · Computer Science 2025-02-14 Yichao Yuan , Advait Iyer , Lin Ma , Nishil Talati

One major technical challenge for modern analytical database systems is how to leverage GPU to exploit their massive parallelism and high bandwidth. Yet, existing GPU-driven database engines suffer from inefficiencies caused by frequent…

Databases · Computer Science 2026-05-12 Tsuyoshi Ozawa , Kazuo Goda

As graph analytics often involves compute-intensive operations, GPUs have been extensively used to accelerate the processing. However, in many applications such as social networks, cyber security, and fraud detection, their representative…

Data Structures and Algorithms · Computer Science 2018-06-28 Mo Sha , Yuchen Li , Bingsheng He , Kian-Lee Tan

For large-scale graph analytics on the GPU, the irregularity of data access and control flow, and the complexity of programming GPUs, have presented two significant challenges to developing a programmable high-performance graph library.…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-01-06 Yangzihao Wang , Yuechao Pan , Andrew Davidson , Yuduo Wu , Carl Yang , Leyuan Wang , Muhammad Osama , Chenshan Yuan , Weitang Liu , Andy T. Riffel , John D. Owens

Graph Neural Networks (GNNs) have emerged as the state-of-the-art (SOTA) method for graph-based learning tasks. However, it still remains prohibitively challenging to inference GNNs over large graph datasets, limiting their application to…

Hardware Architecture · Computer Science 2021-09-21 Yongan Zhang , Haoran You , Yonggan Fu , Tong Geng , Ang Li , Yingyan Lin

Data compression is a popular technique for improving the efficiency of data processing workloads such as SQL queries and more recently, machine learning (ML) with classical batch gradient methods. But the efficacy of such ideas for…

Machine Learning · Computer Science 2019-01-23 Fengan Li , Lingjiao Chen , Yijing Zeng , Arun Kumar , Jeffrey F. Naughton , Jignesh M. Patel , Xi Wu

The torrential influx of floating-point data from domains like IoT and HPC necessitates high-performance lossless compression to mitigate storage costs while preserving absolute data fidelity. Leveraging GPU parallelism for this task…

Databases · Computer Science 2025-11-12 Zheng Li , Weiyan Wang , Ruiyuan Li , Chao Chen , Xianlei Long , Linjiang Zheng , Quanqing Xu , Chuanhui Yang

In GPU-accelerated data analytics, the overhead of data transfer from CPU to GPU becomes a performance bottleneck when the data scales beyond GPU memory capacity due to the limited PCIe bandwidth. Data compression has come to rescue for…

Databases · Computer Science 2026-02-10 Gwangoo Yeo , Zhiyang Shen , Wei Cui , Matteo Interlandi , Rathijit Sen , Bailu Ding , Qi Chen , Minsoo Rhu

Currently, the size of scientific data is growing at an unprecedented rate. Data in the form of tensors exhibit high-order, high-dimensional, and highly sparse features. Although tensor-based analysis methods are very effective, the large…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-13 Zixuan Li

Latent Dirichlet Allocation(LDA) is a popular topic model. Given the fact that the input corpus of LDA algorithms consists of millions to billions of tokens, the LDA training process is very time-consuming, which may prevent the usage of…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-03-14 Xiaolong Xie , Yun Liang , Xiuhong Li , Wei Tan

The increasing complexity of large language models (LLMs) necessitates efficient training strategies to mitigate the high computational costs associated with distributed training. A significant bottleneck in this process is gradient…

Machine Learning · Computer Science 2025-04-09 Igor Polyakov , Alexey Dukhanov , Egor Spirin

While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. To fully leverage modern hardware capabilities, we present a novel open-source FlexCTC toolkit…

Audio and Speech Processing · Electrical Eng. & Systems 2025-08-14 Lilit Grigoryan , Vladimir Bataev , Nikolay Karpov , Andrei Andrusenko , Vitaly Lavrukhin , Boris Ginsburg
‹ Prev 1 2 3 10 Next ›