English
Related papers

Related papers: Execution-Cache-Memory Performance Model: Introduc…

200 papers

Stencil algorithms on regular lattices appear in many fields of computational science, and much effort has been put into optimized implementations. Such activities are usually not guided by performance models that provide estimates of…

Performance · Computer Science 2016-01-28 Holger Stengel , Jan Treibig , Georg Hager , Gerhard Wellein

This paper presents an in-depth analysis of Intel's Haswell microarchitecture for streaming loop kernels. Among the new features examined is the dual-ring Uncore design, Cluster-on-Die mode, Uncore Frequency Scaling, core improvements as…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-11-16 Johannes Hofmann , Dietmar Fey , Jan Eitzinger , Georg Hager , Gerhard Wellein

Computing-in-Memory (CiM) architectures aim to reduce costly data transfers by performing arithmetic and logic operations in memory and hence relieve the pressure due to the memory wall. However, determining whether a given workload can…

Hardware Architecture · Computer Science 2020-01-16 Di Gao , Dayane Reis , Xiaobo Sharon Hu , Cheng Zhuo

We investigate an approach that uses low-level analysis and the execution-cache-memory (ECM) performance model in combination with tuning of hardware parameters to lower energy requirements of memory-bound applications. The ECM model is…

Performance · Computer Science 2016-09-13 Johannes Hofmann , Dietmar Fey

This paper presents refinements to the execution-cache-memory performance model and a previously published power model for multicore processors. The combination of both enables a very accurate prediction of performance and energy…

Performance · Computer Science 2018-07-09 Johannes Hofmann , Georg Hager , Dietmar Fey

Modern multicore chips show complex behavior with respect to performance and power. Starting with the Intel Sandy Bridge processor, it has become possible to directly measure the power dissipation of a CPU chip and correlate this data with…

Performance · Computer Science 2014-03-20 Georg Hager , Jan Treibig , Johannes Habich , Gerhard Wellein

State-of-art NPUs are typically architected as a self-contained sub-system with multiple heterogeneous hardware computing modules, and a dataflow-driven programming model. There lacks well-established methodology and tools in the industry…

We describe verification techniques for embedded memory systems using efficient memory modeling (EMM), without explicitly modeling each memory bit. We extend our previously proposed approach of EMM in Bounded Model Checking (BMC) for a…

Logic in Computer Science · Computer Science 2011-11-09 Malay K. Ganai , Aarti Gupta , Pranav Ashar

Hardware performance monitoring (HPM) is a crucial ingredient of performance analysis tools. While there are interfaces like LIKWID, PAPI or the kernel interface perf\_event which provide HPM access with some additional features, many…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-12 Thomas Röhl , Jan Eitzinger , Georg Hager , Gerhard Wellein

Complex applications running on multicore processors show a rich performance phenomenology. The growing number of cores per ccNUMA domain complicates performance analysis of memory-bound code since system noise, load imbalance, or…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-03 Ayesha Afzal , Georg Hager , Gerhard Wellein

Hardware platforms in high performance computing are constantly getting more complex to handle even when considering multicore CPUs alone. Numerous features and configuration options in the hardware and the software environment that are…

Performance · Computer Science 2020-06-25 Christie L. Alappat , Johannes Hofmann , Georg Hager , Holger Fehske , Alan R. Bishop , Gerhard Wellein

CPU simulators are vital for computer architecture research, primarily for estimating performance under different programs. This poses challenges for fast and accurate simulation of modern CPUs, especially in multi-core systems. Modern CPU…

Performance · Computer Science 2025-10-14 Buqing Xu , Jianfeng Zhu , Yichi Zhang , Qinyi Cai , Guanhua Li , Shaojun Wei , Leibo Liu

We describe a universal modeling approach for predicting single- and multicore runtime of steady-state loops on server processors. To this end we strictly differentiate between application and machine models: An application model comprises…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-30 Johannes Hofmann , Christie L. Alappat , Georg Hager , Dietmar Fey , Gerhard Wellein

Large pretrained self-attention neural networks, or transformers, have been very successful in various tasks recently. The performance of a model on a given task depends on its ability to memorize and generalize the training data. Large…

Machine Learning · Computer Science 2024-08-01 Aki Härmä , Marcin Pietrasik , Anna Wilbik

Big science initiatives are trying to reconstruct and model the brain by attempting to simulate brain tissue at larger scales and with increasingly more biological detail than previously thought possible. The exponential growth of parallel…

Performance · Computer Science 2020-06-25 Francesco Cremonesi , Georg Hager , Gerhard Wellein , Felix Schürmann

CXLMemSim is a fast, lightweight simulation framework that enables performance characterization of memory systems based on Compute Express Link (CXL) .mem technology. CXL.mem allows disaggregation and pooling of memory to mitigate memory…

Performance · Computer Science 2025-06-18 Yiwei Yang , Brian Zhao , Yusheng Zheng , Pooneh Safayenikoo , Tanvir Ahmed Khan , Andi Quinn

The A64FX CPU powers the current number one supercomputer on the Top500 list. Although it is a traditional cache-based multicore processor, its peak performance and memory bandwidth rival accelerator devices. Generating efficient code for…

Performance · Computer Science 2021-08-05 Christie L. Alappat , Jan Laukemann , Thomas Gruber , Georg Hager , Gerhard Wellein , Nils Meyer , Tilo Wettig

Memory performance is often the main bottleneck in modern computing systems. In recent years, researchers have attempted to scale the memory wall by leveraging new technology such as CXL, HBM, and in- and near-memory processing. Developers…

Performance · Computer Science 2024-11-20 Ashwin Poduval , Hayden Coffey , Michael Swift

Trusted Execution Environments (TEEs), such as Intel Software Guard eXtensions (SGX), are considered as a promising approach to resolve security challenges in clouds. TEEs protect the confidentiality and integrity of application code and…

Cryptography and Security · Computer Science 2020-12-14 Robert Krahn , Donald Dragoti , Franz Gregor , Do Le Quoc , Valerio Schiavoni , Pascal Felber , Clenimar Souza , Andrey Brito , Christof Fetzer

Homomorphic encryption (HE) allows direct computations on encrypted data. Despite numerous research efforts, the practicality of HE schemes remains to be demonstrated. In this regard, the enormous size of ciphertexts involved in HE…

Cryptography and Security · Computer Science 2020-10-27 Dayane Reis , Jonathan Takeshita , Taeho Jung , Michael Niemier , Xiaobo Sharon Hu
‹ Prev 1 2 3 10 Next ›