Related papers: Understanding and Optimizing Persistent Memory All…

Fast, Multicore-Scalable, Low-Fragmentation Memory Allocation through Large Virtual Memory and Global Data Structures

We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a…

Programming Languages · Computer Science 2015-08-26 Martin Aigner , Christoph M. Kirsch , Michael Lippautz , Ana Sokolova

PIM-malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures

The ability to dynamically allocate memory is fundamental in modern programming languages. However, this feature is not adequately supported in current general-purpose PIM devices. To identify key design principles that PIM must consider,…

Hardware Architecture · Computer Science 2026-01-28 Dongjae Lee , Bongjoon Hyun , Youngjin Kwon , Minsoo Rhu

Improving the Performance and Endurance of Persistent Memory with Loose-Ordering Consistency

Persistent memory provides high-performance data persistence at main memory. Memory writes need to be performed in strict order to satisfy storage consistency requirements and enable correct recovery from system crashes. Unfortunately,…

Hardware Architecture · Computer Science 2017-05-11 Youyou Lu , Jiwu Shu , Long Sun , Onur Mutlu

Memory Planning for Deep Neural Networks

We study memory allocation patterns in DNNs during inference, in the context of large-scale systems. We observe that such memory allocation patterns, in the context of multi-threading, are subject to high latencies, due to \texttt{mutex}…

Machine Learning · Computer Science 2022-03-02 Maksim Levental

MALLOC: Benchmarking the Memory-aware Long Sequence Compression for Large Sequential Recommendation

The scaling law, which indicates that model performance improves with increasing dataset and model capacity, has fueled a growing trend in expanding recommendation models in both industry and academia. However, the advent of large-scale…

Information Retrieval · Computer Science 2026-01-30 Qihang Yu , Kairui Fu , Zhaocheng Du , Yuxuan Si , Kaiyuan Li , Weihao Zhao , Zhicheng Zhang , Jieming Zhu , Quanyu Dai , Zhenhua Dong , Shengyu Zhang , Kun Kuang , Fei Wu

Metall: A Persistent Memory Allocator For Data-Centric Analytics

Data analytics applications transform raw input data into analytics-specific data structures before performing analytics. Unfortunately, such data ingestion step is often more expensive than analytics. In addition, various types of NVRAM…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-03-02 Keita Iwabuchi , Karim Youssef , Kaushik Velusamy , Maya Gokhale , Roger Pearce

Montage: A General System for Buffered Durably Linearizable Data Structures

The recent emergence of fast, dense, nonvolatile main memory suggests that certain long-lived data might remain in its natural pointer-rich format across program runs and hardware reboots. Operations on such data must be instrumented with…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-30 Haosen Wen , Wentao Cai , Mingzhe Du , Louis Jenkins , Benjamin Valpey , Michael L. Scott

SpeedMalloc: Improving Multi-threaded Applications via a Lightweight Core for Memory Allocation

Memory allocation, though constituting only a small portion of the executed code, can have a "butterfly effect" on overall program performance, leading to significant and far-reaching impacts. Despite accounting for just approximately 5% of…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-29 Ruihao Li , Qinzhe Wu , Krishna Kavi , Gayatri Mehta , Jonathan C. Beard , Neeraja J. Yadwadkar , Lizy K. John

STAlloc: Enhancing Memory Efficiency in Large-Scale Model Training with Spatio-Temporal Planning

The rapid scaling of large language models (LLMs) has significantly increased GPU memory pressure, which is further aggravated by training optimization techniques such as virtual pipeline and recomputation that disrupt tensor lifespans and…

Machine Learning · Computer Science 2025-11-26 Zixiao Huang , Junhao Hu , Hao Lin , Chunyang Zhu , Yueran Tang , Quanlu Zhang , Zhen Guo , Zhenhua Li , Shengen Yan , Zhenhua Zhu , Guohao Dai , Yu Wang

SJMalloc: the security-conscious, fast, thread-safe and memory-efficient heap allocator

Heap-based exploits that leverage memory management errors continue to pose a significant threat to application security. The root cause of these vulnerabilities are the memory management errors within the applications, however various…

Operating Systems · Computer Science 2024-10-24 Stephan Bauroth

GreenMalloc: Allocator Optimisation for Industrial Workloads

We present GreenMalloc, a multi objective search-based framework for automatically configuring memory allocators. Our approach uses NSGA II and rand_malloc as a lightweight proxy benchmarking tool. We efficiently explore allocator…

Software Engineering · Computer Science 2026-05-05 Aidan Dakhama , W. B. Langdon , Hector D. Menendez , Karine Even-Mendoza

Old is Gold: Optimizing Single-threaded Applications with Exgen-Malloc

Memory allocators hide beneath nearly every application stack, yet their performance footprint extends far beyond their code size. Even small inefficiencies in the allocators ripple through caches and the rest of the memory hierarchy,…

Programming Languages · Computer Science 2025-10-14 Ruihao Li , Lizy K. John , Neeraja J. Yadwadkar

Releasing Memory with Optimistic Access: A Hybrid Approach to Memory Reclamation and Allocation in Lock-Free Programs

Lock-free data structures are an important tool for the development of concurrent programs as they provide scalability, low latency and avoid deadlocks, livelocks and priority inversion. However, they require some sort of additional support…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-02-14 Pedro Moreno , Ricardo Rocha

Observations on Porting In-memory KV stores to Persistent Memory

Systems that require high-throughput and fault tolerance, such as key-value stores and databases, are looking to persistent memory to combine the performance of in-memory systems with the data-consistent fault-tolerance of nonvolatile…

Databases · Computer Science 2020-02-07 Brian Choi , Parv Saxena , Ryan Huang , Randal Burns

Don't cry over spilled records: Memory elasticity of data-parallel applications and its application to cluster scheduling

Understanding the performance of data-parallel workloads when resource-constrained has significant practical importance but unfortunately has received only limited attention. This paper identifies, quantifies and demonstrates memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-02-15 Calin Iorgulescu , Florin Dinu , Aunn Raza , Wajih Ul Hassan , Willy Zwaenepoel

Memory Reallocation with Polylogarithmic Overhead

The Memory Reallocation problem asks to dynamically maintain an assignment of given objects of various sizes to non-overlapping contiguous chunks of memory, while supporting updates (insertions/deletions) in an online fashion. The total…

Data Structures and Algorithms · Computer Science 2026-02-18 Ce Jin

StarMalloc: A Formally Verified, Concurrent, Performant, and Security-Oriented Memory Allocator

In this work, we present StarMalloc, a verified, security-oriented, concurrent memory allocator that can be used as a drop-in replacement in real-world projects. Using the Steel separation logic framework, we show how to specify and verify…

Programming Languages · Computer Science 2024-03-15 Antonin Reitz , Aymeric Fromherz , Jonathan Protzenko

Persistent Memory Transactions

This paper presents a comprehensive analysis of performance trade offs between implementation choices for transaction runtime systems on persistent memory. We compare three implementations of transaction runtimes: undo logging, redo…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-04 Virendra Marathe , Achin Mishra , Amee Trivedi , Yihe Huang , Faisal Zaghloul , Sanidhya Kashyap , Margo Seltzer , Tim Harris , Steve Byan , Bill Bridge , Dave Dice

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems

Parallel programmers face the often irreconcilable goals of programmability and performance. HPC systems use distributed memory for scalability, thereby sacrificing the programmability advantages of shared memory programming models.…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-01-21 Bharath Ramesh , Calvin J. Ribbens , Srinidhi Varadarajan

Reconsidering "Reconsidering Custom Memory Allocation"

Programmers using native languages such as C, C++, or Rust can implement custom memory allocation strategies to improve execution time. In their paper titled "Reconsidering Custom Memory Allocation" almost 25 years ago, Berger et al. showed…

Programming Languages · Computer Science 2026-05-19 Nicolas van Kempen , Emery D. Berger