English
Related papers

Related papers: Dynamic Simultaneous Multithreaded Architecture

200 papers

Speculative multi-threading (SpMT) has been proposed as a perspective method to exploit Chip Multiprocessors (CMP) hardware potential. It is a thread level speculation (TLS) model mainly depending on software and hardware co-design. This…

Hardware Architecture · Computer Science 2015-12-29 Dong Zhaoyu , Gao Bing , Zhao Yinliang , Song Shaolong , Du Yanning

Deep neural networks (DNNs) are known for their inability to utilize underlying hardware resources due to hardware susceptibility to sparse activations and weights. Even in finer granularities, many of the non-zero values hold a portion of…

Machine Learning · Computer Science 2020-09-21 Gil Shomron , Uri Weiser

Networking is considered a datacenter tax, and hyperscalers push hard to provide high-performance networking with minimal resource expenditure. To keep up with the ever-increasing network rates, many CPU cycles are spent on the networking…

Hardware Architecture · Computer Science 2025-03-11 Amin Mamandipoor , Huy Dinh Tran , Mohammad Alian

Different from the traditional software vulnerability, the microarchitecture side channel has three characteristics: extensive influence, potent threat, and tough defense. The main reason for the micro-architecture side channel is resource…

Cryptography and Security · Computer Science 2019-10-29 Yue Zhang , Ziyuan Zhu , Dan Meng

Recent data stream processing systems (DSPSs) can achieve excellent performance when processing large volumes of data under tight latency constraints. However, they sacrifice support for concurrent state access that eases the burden of…

Databases · Computer Science 2023-06-21 Shuhao Zhang , Yingjun Wu , Feng Zhang , Bingsheng He

Major chip manufacturers have all introduced Multithreaded processors. These processors are used for running a variety of workloads. Efficient resource utilization is an important design aspect in such processors. Depending on the workload,…

Performance · Computer Science 2019-09-20 Murthy Durbhakula

GPGPUs use the Single-Instruction-Multiple-Thread (SIMT) execution model where a group of threads-wavefront or warp-execute instructions in lockstep. When threads in a group encounter a branching instruction, not all threads in the group…

Programming Languages · Computer Science 2022-01-17 Charitha Saumya , Kirshanthan Sundararajah , Milind Kulkarni

This work unifies insights from the systems and functional programming communities, in order to enable compositional reasoning about software which is nonetheless efficiently realizable in hardware. It exploits a correspondence between…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-03-18 Thomas Dickerson

A great variety of static analyses that compute safety properties of single-thread programs have now been developed. This paper presents a systematic method to extend a class of such static analyses, so that they handle programs with…

Programming Languages · Computer Science 2009-11-02 Jean-Loup Carre , Charles Hymans

This article introduces a highly parallel algorithm for molecular dynamics simulations with short-range forces on single node multi- and many-core systems. The algorithm is designed to achieve high parallel speedups for strongly…

Computational Physics · Physics 2013-11-20 R. Meyer

In typical embedded applications, the precise execution time of the program does not matter, and it is sufficient to meet a real-time deadline. However, modern applications in information security have become much more time-sensitive, due…

Cryptography and Security · Computer Science 2020-05-07 Pantea Kiaei , Patrick Schaumont

The problem of identifying intersections between two sets of d-dimensional axis-parallel rectangles appears frequently in the context of agent-based simulation studies. For this reason, the High Level Architecture (HLA) specification -- a…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-28 Moreno Marzolla , Gabriele D'Angelo

In current microarchitectures, due to the complex memory hierarchies and different latencies on memory accesses, thread and data mapping are important issues to improve application performance. Software transactional memory (STM) is an…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-24 Douglas Pereira Pasqualin , Matthias Diener , André Rauber Du Bois , Maurício Lima Pilla

We study a mismatch between the deep learning recommendation models' flat architecture, common distributed training paradigm and hierarchical data center topology. To address the associated inefficiencies, we propose Disaggregated…

Work presented in this paper describes a general algorithm and its finite element implementation for performing concurrent multiple sub-domain simulations in linear structural dynamics. Using this approach one can solve problems in which…

Numerical Analysis · Mathematics 2013-12-25 Tejas Ruparel , Azim Eskandarian , James Lee

Deep learning is a popular machine learning technique and has been applied to many real-world problems. However, training a deep neural network is very time-consuming, especially on big data. It has become difficult for a single machine to…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-09-04 Xing Zhao , Aijun An , Junfeng Liu , Bao Xin Chen

We investigate the utility of augmenting a microprocessor with a single execution pipeline by adding a second copy of the execution pipeline in parallel with the existing one. The resulting dual-hardware-threaded microprocessor has two…

Hardware Architecture · Computer Science 2023-05-30 Madhav P. Desai

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS,…

Databases · Computer Science 2019-04-10 Shuhao Zhang , Jiong He , Amelie Chi Zhou , Bingsheng He

This paper presents BDDT-SCC, a task-parallel runtime system for non cache-coherent multicore processors, implemented for the Intel Single-Chip Cloud Computer. The BDDT-SCC runtime includes a dynamic dependence analysis and automatic…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-15 Alexandros Labrineas , Polyvios Pratikakis , Dimitrios S. Nikolopoulos , Angelos Bilas

Multi-threaded programs have traditionally fallen into one of two domains: cooperative and competitive. These two domains have traditionally remained mostly disjoint, with cooperative threading used for increasing throughput in…

Programming Languages · Computer Science 2018-07-11 Stefan K. Muller , Umut A. Acar , Robert Harper
‹ Prev 1 2 3 10 Next ›