Related papers: StarMalloc: A Formally Verified, Concurrent, Perfo…
We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a…
We present GreenMalloc, a multi objective search-based framework for automatically configuring memory allocators. Our approach uses NSGA II and rand_malloc as a lightweight proxy benchmarking tool. We efficiently explore allocator…
The proliferation of fast, dense, byte-addressable nonvolatile memory suggests that data might be kept in pointer-rich "in-memory" format across program runs and even process and system crashes. For full generality, such data requires…
Memory allocation, though constituting only a small portion of the executed code, can have a "butterfly effect" on overall program performance, leading to significant and far-reaching impacts. Despite accounting for just approximately 5% of…
The ability to dynamically allocate memory is fundamental in modern programming languages. However, this feature is not adequately supported in current general-purpose PIM devices. To identify key design principles that PIM must consider,…
Data analytics applications transform raw input data into analytics-specific data structures before performing analytics. Unfortunately, such data ingestion step is often more expensive than analytics. In addition, various types of NVRAM…
Use-after-free (UAF) is a critical and prevalent problem in memory unsafe languages. While many solutions have been proposed, balancing security, run-time cost, and memory overhead (an impossible trinity) is hard. In this paper, we show one…
Heap-based exploits that leverage memory management errors continue to pose a significant threat to application security. The root cause of these vulnerabilities are the memory management errors within the applications, however various…
This paper proposes a novel solution: the elimination of paged virtual memory and partial outsourcing of memory page allocation and manipulation from the operating system kernel into the individual process' user space - a user mode page…
Common implementations of core memory allocation components, like the Linux buddy system, handle concurrent allocation/release requests by synchronizing threads via spin-locks. This approach is clearly not prone to scale with large thread…
Attacks on heap memory, encompassing memory overflow, double and invalid free, use-after-free (UAF), and various heap spraying techniques are ever-increasing. Existing entropy-based secure memory allocators provide statistical defenses…
Verification of concurrent data structures is one of the most challenging tasks in software verification. The topic has received considerable attention over the course of the last decade. Nevertheless, human-driven techniques remain…
This paper introduces OPTIMUM-DERAM, a highly consistent, scalable, secure, and decentralized shared memory solution. Traditional distributed shared memory implementations offer multi-object support by multi-threading a single object memory…
Cooperation between verification methods is crucial to tackle the challenging problem of software verification. The paper focuses on the verification of C programs using pointers and it formalizes a cooperation between static analyzers…
Memory allocators hide beneath nearly every application stack, yet their performance footprint extends far beyond their code size. Even small inefficiencies in the allocators ripple through caches and the rest of the memory hierarchy,…
The rapid scaling of large language models (LLMs) has significantly increased GPU memory pressure, which is further aggravated by training optimization techniques such as virtual pipeline and recomputation that disrupt tensor lifespans and…
Speculative decoding accelerates autoregressive generation by separating token proposal from verification, but most existing approaches are designed for single-node execution and do not scale well to multi-accelerator clusters used for…
Memory management in lock-free data structures remains a major challenge in concurrent programming. Design techniques including read-copy-update (RCU) and hazard pointers provide workable solutions, and are widely used to great effect.…
We propose a memory-model-aware static program analysis method for accurately analyzing the behavior of concurrent software running on processors with weak consistency models such as x86-TSO, SPARC-PSO, and SPARC-RMO. At the center of our…
Proving correctness of distributed or concurrent algorithms is a mind-challenging and complex process. Slight errors in the reasoning are difficult to find, calling for computer-checked proof systems. In order to build computer-checked…