Related papers: CPMA: An Efficient Batch-Parallel Compressed Set W…

PaC-trees: Supporting Parallel and Compressed Purely-Functional Collections

Many modern programming languages are shifting toward a functional style for collection interfaces such as sets, maps, and sequences. Functional interfaces offer many advantages, including being safe for parallelism and providing simple and…

Data Structures and Algorithms · Computer Science 2022-04-14 Laxman Dhulipala , Guy E. Blelloch , Yan Gu , Yihan Sun

Compressed Geometric Arrays for Point Cloud Processing

The ever-increasing demand for 3D modeling in the emerging immersive applications has made point clouds an essential class of data for 3D image and video processing. Tree based structures are commonly used for representing point clouds…

Multimedia · Computer Science 2021-10-25 Hoda Roodaki , Mahdi Nazm Bojnordi

Improved Constructions of Coded Caching Schemes for Combination Networks

In an $(H,r)$ combination network, a single content library is delivered to ${H\choose r}$ users through deployed $H$ relays without cache memories, such that each user with local cache memories is simultaneously served by a different…

Information Theory · Computer Science 2019-10-25 Minquan Cheng , Yiqun Li , Xi Zhong , Ruizhong Wei

Parallel Recursive State Compression for Free

This paper focuses on reducing memory usage in enumerative model checking, while maintaining the multi-core scalability obtained in earlier work. We present a tree-based multi-core compression method, which works by leveraging sharing among…

Data Structures and Algorithms · Computer Science 2011-05-17 Alfons Laarman , Jaco van de Pol , Michael Weber

Low-Latency Graph Streaming Using Compressed Purely-Functional Trees

Due to the dynamic nature of real-world graphs, there has been a growing interest in the graph-streaming setting where a continuous stream of graph updates is mixed with arbitrary graph queries. In principle, purely-functional trees are an…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-04-18 Laxman Dhulipala , Julian Shun , Guy Blelloch

ParMAC: distributed optimisation of nested functions, with application to learning binary autoencoders

Many powerful machine learning models are based on the composition of multiple processing layers, such as deep nets, which gives rise to nonconvex objective functions. A general, recent approach to optimise such "nested" functions is the…

Machine Learning · Computer Science 2016-05-31 Miguel Á. Carreira-Perpiñán , Mehdi Alizadeh

Parallel batch queries on dynamic trees: algorithms and experiments

Dynamic tree data structures maintain a forest while supporting insertion and deletion of edges and a broad set of queries in $O(\log n)$ time per operation. Such data structures are at the core of many modern algorithms. Recent work has…

Data Structures and Algorithms · Computer Science 2025-06-23 Humza Ikram , Andrew Brady , Daniel Anderson , Guy Blelloch

CPSAA: Accelerating Sparse Attention using Crossbar-based Processing-In-Memory Architecture

The attention mechanism requires huge computational efforts to process unnecessary calculations, significantly limiting the system's performance. Researchers propose sparse attention to convert some DDMM operations to SDDMM and SpMM…

Hardware Architecture · Computer Science 2023-10-10 Huize Li , Hai Jin , Long Zheng , Yu Huang , Xiaofei Liao , Dan Chen , Zhuohui Duan , Cong Liu , Jiahong Xu , Chuanyi Gui

DaPPA: A Data-Parallel Programming Framework for Processing-in-Memory Architectures

The growing volume of data in modern applications has led to significant computational costs in conventional processor-centric systems. Processing-in-memory (PIM) architectures alleviate these costs by moving computation closer to memory,…

Hardware Architecture · Computer Science 2025-04-23 Geraldo F. Oliveira , Alain Kohli , David Novo , Ataberk Olgun , A. Giray Yaglikci , Saugata Ghose , Juan Gómez-Luna , Onur Mutlu

MEGA-PCC: A Mamba-based Efficient Approach for Joint Geometry and Attribute Point Cloud Compression

Joint compression of point cloud geometry and attributes is essential for efficient 3D data representation. Existing methods often rely on post-hoc recoloring procedures and manually tuned bitrate allocation between geometry and attribute…

Image and Video Processing · Electrical Eng. & Systems 2025-12-30 Kai-Hsiang Hsieh , Monyneath Yim , Wen-Hsiao Peng , Jui-Chiu Chiang

PPC-MT: Parallel Point Cloud Completion with Mamba-Transformer Hybrid Architecture

Existing point cloud completion methods struggle to balance high-quality reconstruction with computational efficiency. To address this, we propose PPC-MT, a novel parallel framework for point cloud completion leveraging a hybrid…

Computer Vision and Pattern Recognition · Computer Science 2026-03-03 Jie Li , Shengwei Tian , Long Yu , Xin Ning

Sherman: A Write-Optimized Distributed B+Tree Index on Disaggregated Memory

Memory disaggregation architecture physically separates CPU and memory into independent components, which are connected via high-speed RDMA networks, greatly improving resource utilization of databases. However, such an architecture poses…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-21 Qing Wang , Youyou Lu , Jiwu Shu

FAST: FPGA-based Subgraph Matching on Massive Graphs

Subgraph matching is a basic operation widely used in many applications. However, due to its NP-hardness and the explosive growth of graph data, it is challenging to compute subgraph matching, especially in large graphs. In this paper, we…

Databases · Computer Science 2021-02-25 Xin Jin , Zhengyi Yang , Xuemin Lin , Shiyu Yang , Lu Qin , You Peng

Energy Efficient and Throughput Optimal CSMA Scheme

Carrier Sense Multiple Access (CSMA) is widely used as a Medium Access Control (MAC) in wireless networks due to its simplicity and distributed nature. This motivated researchers to find CSMA schemes that achieve throughput optimality. In…

Information Theory · Computer Science 2019-08-13 Ali Maatouk , Mohamad Assaad , Anthony Ephremides

Compressed Convolutional Attention: Efficient Attention in a Compressed Latent Space

Multi-headed Attention's (MHA) quadratic compute and linearly growing KV-cache make long-context transformers expensive to train and serve. Prior works such as Grouped Query Attention (GQA) and Multi-Latent Attention (MLA) shrink the cache,…

Computation and Language · Computer Science 2026-03-18 Tomas Figliolia , Nicholas Alonso , Rishi Iyer , Quentin Anthony , Beren Millidge

Coded Caching Schemes with Low Rate and Subpacketizations

Coded caching scheme, which is an effective technique to increase the transmission efficiency during peak traffic times, has recently become quite popular among the coding community. Generally rate can be measured to the transmission in the…

Information Theory · Computer Science 2017-10-03 Minquan Cheng , Qifa Yan , Xiaohu Tang , Jing Jiang

Fast, Accurate and Memory-Efficient Partial Permutation Synchronization

Previous partial permutation synchronization (PPS) algorithms, which are commonly used for multi-object matching, often involve computation-intensive and memory-demanding matrix operations. These operations become intractable for large…

Computer Vision and Pattern Recognition · Computer Science 2022-04-01 Shaohan Li , Yunpeng Shi , Gilad Lerman

PDA Construction via Union of Cartesian Product Cache Configurations for Coded Caching

Caching is an efficient technique to reduce peak traffic by storing popular content in local caches. Placement delivery array (PDA) proposed by Yan et al. is a combinatorial structure to design coded caching schemes with uncoded placement…

Information Theory · Computer Science 2025-01-22 Jinyu Wang , Minquan Cheng , Kai Wan , Giuseppe Caire

An Efficient Parallel Data Clustering Algorithm Using Isoperimetric Number of Trees

We propose a parallel graph-based data clustering algorithm using CUDA GPU, based on exact clustering of the minimum spanning tree in terms of a minimum isoperimetric criteria. We also provide a comparative performance analysis of our…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-17 Ramin Javadi , Saleh Ashkboos

Systolic Tensor Array: An Efficient Structured-Sparse GEMM Accelerator for Mobile CNN Inference

Convolutional neural network (CNN) inference on mobile devices demands efficient hardware acceleration of low-precision (INT8) general matrix multiplication (GEMM). The systolic array (SA) is a pipelined 2D array of processing elements…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-19 Zhi-Gang Liu , Paul N. Whatmough , Matthew Mattina