English
Related papers

Related papers: DynaSOAr: A Parallel Memory Allocator for Object-o…

200 papers

Object-oriented programming (OOP) has long been regarded as too inefficient for SIMD high-performance computing, despite the fact that many important HPC applications have an inherent object structure. We discovered a broad subset of OOP…

Programming Languages · Computer Science 2019-05-30 Matthias Springer

Object-oriented programming is often regarded as too inefficient for high-performance computing (HPC), despite the fact that many important HPC problems have an inherent object structure. Our goal is to bring efficient, object-oriented…

Programming Languages · Computer Science 2019-08-19 Matthias Springer

For the last thirty years, a large variety of memory allocators have been proposed. Since performance, memory usage and energy consumption of each memory allocator differs, software engineers often face difficult choices in selecting the…

Operating Systems · Computer Science 2024-06-25 José L. Risco-Martín , J. Manuel Colmenar , David Atienza , J. Ignacio Hidalgo

Applications making excessive use of single-object based data structures (such as linked lists, trees, etc...) can see a drop in efficiency over a period of time due to the randomization of nodes in memory. This slow down is due to the…

Data Structures and Algorithms · Computer Science 2021-10-22 Dhruv Matani , Gaurav Menghani

For the last thirty years, several Dynamic Memory Managers (DMMs) have been proposed. Such DMMs include first fit, best fit, segregated fit and buddy systems. Since the performance, memory usage and energy consumption of each DMM differs,…

Neural and Evolutionary Computing · Computer Science 2024-07-16 José L. Risco-Martín , David Atienza , J. Manuel Colmenar , Oscar Garnica

We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a…

Programming Languages · Computer Science 2015-08-26 Martin Aigner , Christoph M. Kirsch , Michael Lippautz , Ana Sokolova

The research interest in specialized hardware accelerators for deep neural networks (DNN) spikes recently owing to their superior performance and efficiency. However, today's DNN accelerators primarily focus on accelerating specific…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-11 Cong Guo , Yangjie Zhou , Jingwen Leng , Yuhao Zhu , Zidong Du , Quan Chen , Chao Li , Bin Yao , Minyi Guo

Memory safety errors continue to pose a significant threat to current computing systems, and graphics processing units (GPUs) are no exception. A prominent class of memory safety algorithms is allocation-based solutions. The key idea is to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-27 Mohamed Tarek Ibn Ziad , Sana Damani , Mark Stephenson , Stephen W. Keckler , Aamer Jaleel

Real-time trajectory optimization for nonlinear constrained autonomous systems is critical and typically performed by CPU-based sequential solvers. Specifically, reliance on global sparse linear algebra or the serial nature of dynamic…

Robotics · Computer Science 2026-03-13 Yilin Zou , Zhong Zhang , Maxime Robic , Fanghua Jiang

The exponential growth of data-intensive machine learning workloads has exposed significant limitations in conventional GPU-accelerated systems, especially when processing datasets exceeding GPU DRAM capacity. We propose MQMS, an augmented…

Hardware Architecture · Computer Science 2024-12-10 Ayush Gundawar , Euijun Chung , Hyesoon Kim

The ability to dynamically allocate memory is fundamental in modern programming languages. However, this feature is not adequately supported in current general-purpose PIM devices. To identify key design principles that PIM must consider,…

Hardware Architecture · Computer Science 2026-01-28 Dongjae Lee , Bongjoon Hyun , Youngjin Kwon , Minsoo Rhu

As quantum computing advances towards practical applications, quantum operating systems become inevitable, where multi-programming -- the core functionality of operating systems -- enables concurrent execution of multiple quantum programs…

Quantum Physics · Physics 2025-07-08 Wenjie Sun , Xiaoyu Li , Zhigang Wang , Geng Chen , Lianhui Yu , Guowu Yang

Domain-specific accelerators deliver exceptional performance on their target workloads through fabrication-time orchestrated datapaths. However, such specialized architectures often exhibit performance fragility when exposed to new kernels…

Hardware Architecture · Computer Science 2026-02-20 Zhenyu Bai , Pranav Dangi , Rohan Juneja , Zhaoying Li , Zhanglu Yan , Huiying Lan , Tulika Mitra

With high computation power and memory bandwidth, graphics processing units (GPUs) lend themselves to accelerate data-intensive analytics, especially when such applications fit the single instruction multiple data (SIMD) model. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-12-12 Hang Liu , H. Howie Huang

Modern GPUs increasingly rely on specialized and asynchronous hardware units to deliver high performance. Yet these units are often underutilized because today's GPU software stacks still organize programming and execution around a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-06 Zijian He , Adrian Sampson , Yiying Zhang , Zhiyuan Guo

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Current AI training infrastructure is dominated by single instruction multiple data (SIMD) and systolic array architectures, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), that excel at accelerating parallel…

Neural and Evolutionary Computing · Computer Science 2023-11-09 Jan Finkbeiner , Thomas Gmeinder , Mark Pupilli , Alexander Titterton , Emre Neftci

Distributed Asynchronous Object Store (DAOS) is a novel software-defined object store leveraging Non-Volatile Memory (NVM) devices, designed for high performance. It provides a number of interfaces for applications to undertake I/O, ranging…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-30 Nicolau Manubens , Johann Lombardi , Simon D. Smart , Emanuele Danovaro , Tiago Quintino , Dean Hildebrand , Adrian Jackson

A theoretical memory with limited processing power and internal connectivity at each element is proposed. This memory carries out parallel processing within itself to solve generic array problems. The applicability of this in-memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-09-28 Chengpu Wang

A superoptimizing compiler--one that performs a meaningful search of the program space as part of the optimization process--can find optimization opportunities that are missed by even the best existing optimizing compilers. We created…

Programming Languages · Computer Science 2024-09-04 Zhengyang Liu , Stefan Mada , John Regehr
‹ Prev 1 2 3 10 Next ›