Related papers: A1: A Distributed In-Memory Graph Database

OnePiece: A Large-Scale Distributed Inference System with RDMA for Complex AI-Generated Content (AIGC) Workflows

The rapid growth of AI-generated content (AIGC) has enabled high-quality creative production across diverse domains, yet existing systems face critical inefficiencies in throughput, resource utilization, and scalability under concurrent…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-29 June Chen , Neal Xu , Gragas Huang , Bok Zhou , Stephen Liu

A Survey of Parallel A*

A* is a best-first search algorithm for finding optimal-cost paths in graphs. A* benefits significantly from parallelism because in many applications, A* is limited by memory usage, so distributed memory implementations of A* that use all…

Artificial Intelligence · Computer Science 2017-08-18 Alex Fukunaga , Adi Botea , Yuu Jinnai , Akihiro Kishimoto

Active Access: A Mechanism for High-Performance Distributed Data-Centric Computations

Remote memory access (RMA) is an emerging high-performance programming model that uses RDMA hardware directly. Yet, accessing remote memories cannot invoke activities at the target which complicates implementation and limits performance of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-20 Maciej Besta , Torsten Hoefler

Parallel Breadth-First Search on Distributed Memory Systems

Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-10-17 Aydin Buluc , Kamesh Madduri

AeonG: An Efficient Built-in Temporal Support in Graph Databases

Real world graphs are often dynamic and evolve over time. It is crucial for storing and querying graph evolution in graph databases. However, existing works either suffer from high storage overhead or lack efficient temporal query support,…

Databases · Computer Science 2024-04-02 Jiamin Hou , Zhanhao Zhao , Zhouyu Wang , Wei Lu , Guodong Jin , Dong Wen , Xiaoyong Du

ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System

Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-11 Marzieh Barkhordar , Alireza Tabatabaeian , Mohammad Sadrosadati , Christina Giannoula , Juan Gomez Luna , Izzat El Hajj , Onur Mutlu , Alaa R. Alameldeen

Co-Designing Graph-based Approximate Nearest Neighbor Search at Billion Scale for Processing-in-Memory

Approximate Nearest Neighbor Search (ANNS) is a core primitive in modern AI systems, and graph-based methods currently offer the best accuracy-efficiency trade-off at scale. The workload is fundamentally memory-bound: graph traversal…

Hardware Architecture · Computer Science 2026-05-26 Sitian Chen , Yusen Li , Yao Chen , Minwen Deng , Jintao Meng , Amelie Chi Zhou

Deep Recommender Models Inference: Automatic Asymmetric Data Flow Optimization

Deep Recommender Models (DLRMs) inference is a fundamental AI workload accounting for more than 79% of the total AI workload in Meta's data centers. DLRMs' performance bottleneck is found in the embedding layers, which perform many random…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-03 Giuseppe Ruggeri , Renzo Andri , Daniele Jahier Pagliari , Lukas Cavigelli

Distributed-Memory Breadth-First Search on Massive Graphs

This chapter studies the problem of traversing large graphs using the breadth-first search order on distributed-memory supercomputers. We consider both the traditional level-synchronous top-down algorithm as well as the recently discovered…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-15 Aydin Buluc , Scott Beamer , Kamesh Madduri , Krste Asanovic , David Patterson

Evaluation of a Simple, Scalable, Parallel Best-First Search Strategy

Large-scale, parallel clusters composed of commodity processors are increasingly available, enabling the use of vast processing capabilities and distributed RAM to solve hard search problems. We investigate Hash-Distributed A* (HDA*), a…

Artificial Intelligence · Computer Science 2015-03-20 Akihiro Kishimoto , Alex Fukunaga , Adi Botea

Mnemis: Dual-Route Retrieval on Hierarchical Graphs for Long-Term LLM Memory

AI Memory, specifically how models organizes and retrieves historical messages, becomes increasingly valuable to Large Language Models (LLMs), yet existing methods (RAG and Graph-RAG) primarily retrieve memory through similarity-based…

Computation and Language · Computer Science 2026-04-13 Zihao Tang , Xin Yu , Ziyu Xiao , Zengxuan Wen , Zelin Li , Jiaxi Zhou , Hualei Wang , Haohua Wang , Haizhen Huang , Weiwei Deng , Feng Sun , Qi Zhang

Mining The Data From Distributed Database Using An Improved Mining Algorithm

Association rule mining is an active data mining research area and most ARM algorithms cater to a centralized environment. Centralized data mining to discover useful patterns in distributed databases isn't always feasible because merging…

Databases · Computer Science 2010-04-13 J. Arokia Renjit , K. L. Shunmuganathan

Single Machine Graph Analytics on Massive Datasets Using Intel Optane DC Persistent Memory

Intel Optane DC Persistent Memory (Optane PMM) is a new kind of byte-addressable memory with higher density and lower cost than DRAM. This enables the design of affordable systems that support up to 6TB of randomly accessible memory. In…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-25 Gurbinder Gill , Roshan Dathathri , Loc Hoang , Ramesh Peri , Keshav Pingali

DA-RAG: Dynamic Attributed Community Search for Retrieval-Augmented Generation

Owing to their unprecedented comprehension capabilities, large language models (LLMs) have become indispensable components of modern web search engines. From a technical perspective, this integration represents retrieval-augmented…

Information Retrieval · Computer Science 2026-02-10 Xingyuan Zeng , Zuohan Wu , Yue Wang , Chen Zhang , Quanming Yao , Libin Zheng , Jian Yin

Distributed-memory $\mathcal{H}$-matrix Algebra I: Data Distribution and Matrix-vector Multiplication

We introduce a data distribution scheme for $\mathcal{H}$-matrices and a distributed-memory algorithm for $\mathcal{H}$-matrix-vector multiplication. Our data distribution scheme avoids an expensive $\Omega(P^2)$ scheduling procedure used…

Numerical Analysis · Mathematics 2020-09-23 Yingzhou Li , Jack Poulson , Lexing Ying

A+ Indexes: Tunable and Space-Efficient Adjacency Lists in Graph Database Management Systems

Graph database management systems (GDBMSs) are highly optimized to perform fast traversals, i.e., joins of vertices with their neighbours, by indexing the neighbourhoods of vertices in adjacency lists. However, existing GDBMSs have…

Databases · Computer Science 2021-03-05 Amine Mhedhbi , Pranjal Gupta , Shahid Khaliq , Semih Salihoglu

A Layered Aggregate Engine for Analytics Workloads

This paper introduces LMFAO (Layered Multiple Functional Aggregate Optimization), an in-memory optimization and execution engine for batches of aggregates over the input database. The primary motivation for this work stems from the…

Databases · Computer Science 2019-06-21 Maximilian Schleich , Dan Olteanu , Mahmoud Abo Khamis , Hung Q. Ngo , XuanLong Nguyen

XDMA: A Distributed, Extensible DMA Architecture for Layout-Flexible Data Movements in Heterogeneous Multi-Accelerator SoCs

As modern AI workloads increasingly rely on heterogeneous accelerators, ensuring high-bandwidth and layout-flexible data movements between accelerator memories has become a pressing challenge. Direct Memory Access (DMA) engines promise high…

Hardware Architecture · Computer Science 2025-08-13 Fanchen Kong , Yunhao Deng , Xiaoling Yi , Ryan Antonio , Marian Verhelst

A Dynamic Retrieval-Augmented Generation System with Selective Memory and Remembrance

We introduce \emph{Adaptive RAG Memory} (ARM), a retrieval-augmented generation (RAG) framework that replaces a static vector index with a \emph{dynamic} memory substrate governed by selective remembrance and decay. Frequently retrieved…

Information Retrieval · Computer Science 2026-01-07 Okan Bursa

d-HNSW: A High-performance Vector Search Engine on Disaggregated Memory

Efficient vector search is essential for powering large-scale AI applications, such as LLMs. Existing solutions are designed for monolithic architectures where compute and memory are tightly coupled. Recently, disaggregated architecture…

Databases · Computer Science 2026-03-17 Fei Fang , Yi Liu , Chen Qian