English
Related papers

Related papers: SynCron: Efficient Synchronization Support for Nea…

200 papers

Recent studies have demonstrated that near-data processing (NDP) is an effective technique for improving performance and energy efficiency of data-intensive workloads. However, leveraging NDP in realistic systems with multiple memory…

Hardware Architecture · Computer Science 2018-12-05 Hyojong Kim , Ramyad Hadidi , Lifeng Nai , Hyesoon Kim , Nuwan Jayasena , Yasuko Eckert , Onur Kayiran , Gabriel H. Loh

Irregular applications comprise an increasingly important workload domain for many fields, including bioinformatics, chemistry, physics, social sciences and machine learning. Therefore, achieving high performance and energy efficiency in…

Hardware Architecture · Computer Science 2022-11-16 Christina Giannoula

Persistent Memory (PM) technologies enable program recovery to a consistent state in a case of failure. To ensure this crash-consistent behavior, programs need to enforce persist ordering by employing mechanisms, such as logging and…

Computational Engineering, Finance, and Science · Computer Science 2023-04-03 Yasas Seneviratne , Korakit Seemakhupt , Sihang Liu , Samira Khan

Spiking Neural Networks (SNNs) are extensively utilized in brain-inspired computing and neuroscience research. To enhance the speed and energy efficiency of SNNs, several many-core accelerators have been developed. However, maintaining the…

Neural and Evolutionary Computing · Computer Science 2024-07-31 Zhuo Chen , De Ma , Xiaofei Jin , Qinghui Xing , Ouwen Jin , Xin Du , Shuibing He , Gang Pan

The use of disaggregated or far memory systems such as CXL memory pools has renewed interest in Near-Data Processing (NDP): situating cores close to memory to reduce bandwidth requirements to and from the CPU. Hardware designs for such…

Operating Systems · Computer Science 2026-04-21 Zikai Liu , Niels Pressel , Jasmin Schult , Roman Meier , Pengcheng Xu , Timothy Roscoe

Near-data accelerators (NDAs) that are integrated with main memory have the potential for significant power and performance benefits. Fully realizing these benefits requires the large available memory capacity to be shared between the host…

Hardware Architecture · Computer Science 2020-12-02 Benjamin Y. Cho , Yongkee Kwon , Sangkug Lym , Mattan Erez

Emerging workloads, such as graph processing and machine learning are approximate because of the scale of data involved and the stochastic nature of the underlying algorithms. These algorithms are often distributed over multiple machines…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-12-28 Asim Kadav , Erik Kruus

In Near Memory Processing (NMP), processing elements(PEs) are placed near the 3D memory, reducing unnecessary data transfers between the CPU and the memory. However, as the CPUs and the PEs of the NMP use a shared memory space, maintaining…

Hardware Architecture · Computer Science 2023-12-13 Amit Kumar Kabat , Shubhang Pandey , TG Venkatesh

Simultaneous multithreading processors improve throughput over single-threaded processors thanks to sharing internal core resources among instructions from distinct threads. However, resource sharing introduces inter-thread interference…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-10-20 Marta Navarro , Josué Feliu , Salvador Petit , María E. Gómez , Julio Sahuquillo

Emerging Compute Express Link (CXL) enables cost-efficient memory expansion beyond the local DRAM of processors. While its CXL$.$mem protocol provides minimal latency overhead through an optimized protocol stack, frequent CXL memory…

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

The steeply growing performance demands for highly power- and energy-constrained processing systems such as end-nodes of the internet-of-things (IoT) have led to parallel near-threshold computing (NTC), joining the energy-efficiency…

Hardware Architecture · Computer Science 2020-04-15 Florian Glaser , Giuseppe Tagliavini , Davide Rossi , Germain Haugou , Qiuting Huang , Luca Benini

The recent advancements in multicore machines highlight the need to simplify concurrent programming in order to leverage their computational power. One way to achieve this is by designing efficient concurrent data structures (e.g. stacks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-31 Nikolaos D. Kallimanis

Near-Data Processing (NDP) has been a promising architectural paradigm to address the memory wall problem for data-intensive applications. Practical implementation of NDP architectures calls for system support for better programmability,…

Hardware Architecture · Computer Science 2025-02-21 Qingcai Jiang , Buxin Tu , Hong An

In modern data centers, energy usage represents one of the major factors affecting operational costs. Power capping is a technique that limits the power consumption of individual systems, which allows reducing the overall power demand at…

Performance · Computer Science 2017-09-05 Stefano Conoci , Pierangelo Di Sanzo , Bruno Ciciani , Francesco Quaglia

With the rapid development of DNN applications, multi-tenant execution, where multiple DNNs are co-located on a single SoC, is becoming a prevailing trend. Although many methods are proposed in prior works to improve multi-tenant…

Hardware Architecture · Computer Science 2025-05-15 Tianhao Cai , Liang Wang , Limin Xiao , Meng Han , Zeyu Wang , Lin Sun , Xiaojian Liao

The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall…

Hardware Architecture · Computer Science 2023-10-30 Bahareh Khabbazan , Marc Riera , Antonio González

In this article, we investigate the impact of architectural parameters of array-based DNN accelerators on accelerator's energy consumption and performance in a wide variety of network topologies. For this purpose, we have developed a tool…

Hardware Architecture · Computer Science 2022-06-28 Mohammad Ali Maleki , Mehdi Kamal , Ali Afzali-Kusha

Communication has become a first-order bottleneck in large-cale GPU workloads, and existing distributed compilers address it mainly by overlapping whole compute and communication kernels at the stream level. This coarse granularity incurs…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-06 Xinwei Qiang , Yue Guan , Zhengding Hu , Keren Zhou , Yufei Ding , Adnan Aziz

Nowadays the number of available processing cores within computing nodes which are used in recent clustered environments, are growing up with a rapid rate. Despite this trend, the number of available network interfaces in such computing…

Distributed, Parallel, and Cluster Computing · Computer Science 2012-07-13 Mohsen Soryani , Morteza Analoui , Ghobad Zarrinchian
‹ Prev 1 2 3 10 Next ›