Related papers: Dynamic Simultaneous Multithreaded Architecture

Prophet: A Speculative Multi-threading Execution Model with Architectural Support Based on CMP

Speculative multi-threading (SpMT) has been proposed as a perspective method to exploit Chip Multiprocessors (CMP) hardware potential. It is a thread level speculation (TLS) model mainly depending on software and hardware co-design. This…

Hardware Architecture · Computer Science 2015-12-29 Dong Zhaoyu , Gao Bing , Zhao Yinliang , Song Shaolong , Du Yanning

Non-Blocking Simultaneous Multithreading: Embracing the Resiliency of Deep Neural Networks

Deep neural networks (DNNs) are known for their inability to utilize underlying hardware resources due to hardware susceptibility to sparse activations and weights. Even in finer granularities, many of the non-zero values hold a portion of…

Machine Learning · Computer Science 2020-09-21 Gil Shomron , Uri Weiser

SDT: Cutting Datacenter Tax Through Simultaneous Data-Delivery Threads

Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure. To keep up with the ever-increasing network rates, many CPU cycles are spent on the networking…

Hardware Architecture · Computer Science 2025-03-11 Amin Mamandipoor , Huy Dinh Tran , Mohammad Alian

DDM: A Demand-based Dynamic Mitigation for SMT Transient Channels

Different from the traditional software vulnerability, the microarchitecture side channel has three characteristics: extensive influence, potent threat, and tough defense. The main reason for the micro-architecture side channel is resource…

Cryptography and Security · Computer Science 2019-10-29 Yue Zhang , Ziyuan Zhu , Dan Meng

Towards Concurrent Stateful Stream Processing on Multicore Processors (Technical Report)

Recent data stream processing systems (DSPSs) can achieve excellent performance when processing large volumes of data under tight latency constraints. However, they sacrifice support for concurrent state access that eases the burden of…

Databases · Computer Science 2023-06-21 Shuhao Zhang , Yingjun Wu , Feng Zhang , Bingsheng He

Branch prediction related Optimizations for Multithreaded Processors

Major chip manufacturers have all introduced Multithreaded processors. These processors are used for running a variety of workloads. Efficient resource utilization is an important design aspect in such processors. Depending on the workload,…

Performance · Computer Science 2019-09-20 Murthy Durbhakula

DARM: Control-Flow Melding for SIMT Thread Divergence Reduction -- Extended Version

GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group of threads-wavefront or warp-execute instructions in lockstep. When threads in a group encounter a branching instruction, not all threads in the group…

Programming Languages · Computer Science 2022-01-17 Charitha Saumya , Kirshanthan Sundararajah , Milind Kulkarni

Adapting Persistent Data Structures for Concurrency and Speculation

This work unifies insights from the systems and functional programming communities, in order to enable compositional reasoning about software which is nonetheless efficiently realizable in hardware. It exploits a correspondence between…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-18 Thomas Dickerson

From Single-thread to Multithreaded: An Efficient Static Analysis Algorithm

A great variety of static analyses that compute safety properties of single-thread programs have now been developed. This paper presents a systematic method to extend a class of such static analyses, so that they handle programs with…

Programming Languages · Computer Science 2009-11-02 Jean-Loup Carre , Charles Hymans

Efficient Parallelization of Short-Range Molecular Dynamics Simulations on Many-Core Systems

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

Synthesis of Parallel Synchronous Software

In typical embedded applications, the precise execution time of the program does not matter, and it is sufficient to meet a real-time deadline. However, modern applications in information security have become much more time-sensitive, due…

Cryptography and Security · Computer Science 2020-05-07 Pantea Kiaei , Patrick Schaumont

Parallel Data Distribution Management on Shared-Memory Multiprocessors

The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification -- a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-28 Moreno Marzolla , Gabriele D'Angelo

Thread and Data Mapping in Software Transactional Memory: An Overview

In current microarchitectures, due to the complex memory hierarchies and different latencies on memory accesses, thread and data mapping are important issues to improve application performance. Software transactional memory (STM) is an…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-24 Douglas Pereira Pasqualin , Matthias Diener , André Rauber Du Bois , Maurício Lima Pilla

Disaggregated Multi-Tower: Topology-aware Modeling Technique for Efficient Large-Scale Recommendation

We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated…

Machine Learning · Computer Science 2024-05-03 Liang Luo , Buyun Zhang , Michael Tsang , Yinbin Ma , Ching-Hsiang Chu , Yuxin Chen , Shen Li , Yuchen Hao , Yanli Zhao , Guna Lakshminarayanan , Ellie Dingqiao Wen , Jongsoo Park , Dheevatsa Mudigere , Maxim Naumov

Concurrent multi-domain simulations in linear structural dynamics using multiple grid multiple time-scale (MGMT) method

Work presented in this paper describes a general algorithm and its finite element implementation for performing concurrent multiple sub-domain simulations in linear structural dynamics. Using this approach one can solve problems in which…

Numerical Analysis · Mathematics 2013-12-25 Tejas Ruparel , Azim Eskandarian , James Lee

Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Xing Zhao , Aijun An , Junfeng Liu , Bao Xin Chen

An evaluation of a microprocessor with two independent hardware execution threads coupled through a shared cache

We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two…

Hardware Architecture · Computer Science 2023-05-30 Madhav P. Desai

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS,…

Databases · Computer Science 2019-04-10 Shuhao Zhang , Jiong He , Amelie Chi Zhou , Bingsheng He

BDDT-SCC: A Task-parallel Runtime for Non Cache-Coherent Multicores

This paper presents BDDT-SCC, a task-parallel runtime system for non cache-coherent multicore processors, implemented for the Intel Single-Chip Cloud Computer. The BDDT-SCC runtime includes a dynamic dependence analysis and automatic…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-15 Alexandros Labrineas , Polyvios Pratikakis , Dimitrios S. Nikolopoulos , Angelos Bilas

Competitive Parallelism: Getting Your Priorities Right

Multi-threaded programs have traditionally fallen into one of two domains: cooperative and competitive. These two domains have traditionally remained mostly disjoint, with cooperative threading used for increasing throughput in…

Programming Languages · Computer Science 2018-07-11 Stefan K. Muller , Umut A. Acar , Robert Harper