Related papers: Branch Target Buffer Reverse Engineering on Arm
High-performance branch target buffers (BTBs) and the L1I cache are key to high-performance front-end. Modern branch predictors are highly accurate, but with an increase in code footprint in modern-day server workloads, BTB and L1I misses…
Many contemporary applications feature multi-megabyte instruction footprints that overwhelm the capacity of branch target buffers (BTB) and instruction caches (L1-I), causing frequent front-end stalls that inevitably hurt performance. BTB…
Modern processors have suffered a deluge of threats exploiting branch instruction collisions inside the branch prediction unit (BPU), from eavesdropping on secret-related branch operations to triggering malicious speculative executions.…
Performance-enhancing mechanisms such as branch prediction, out-of-order execution, and return stack buffer (RSB) have been widely employed in today's modern processing units. Although successful in increasing the CPU performance,…
Modern out-of-order CPUs heavily rely on speculative execution for performance optimization, with branch prediction serving as a cornerstone to minimize stalls and maximize efficiency. Whenever shared branch prediction resources lack proper…
Industries are recently considering the adoption of cloud computing for hosting safety critical applications. However, the use of multicore processors usually adopted in the cloud introduces temporal anomalies due to contention for shared…
Branch predictor (BP) is a critical component of modern processors, and its accurate modeling is essential for compilers and applications. However, processor vendors have disclosed limited details about their BP implementations. Recent…
General matrix multiplication (GEMM) is the computational backbone of modern AI workloads, and its efficiency is critically dependent on effective tiling strategies. Conventional approaches employ symmetric tile buffering, where the…
In the context of hardware trust and assurance, reverse engineering has been often considered as an illegal action. Generally speaking, reverse engineering aims to retrieve information from a product, i.e., integrated circuits (ICs) and…
We study exact sparse linear regression with an $\ell_0-\ell_2$ penalty and develop a branch-and-bound (BnB) algorithm explicitly designed for GPU execution. Starting from a perspective reformulation, we derive an interval relaxation that…
Prior work has observed that fetch-directed prefetching (FDIP) is highly effective at covering instruction cache misses. The key to FDIP's effectiveness is having a sufficiently large BTB to accommodate the application's branch working set.…
Previous schemes for designing secure branch prediction unit (SBPU) based on physical isolation can only offer limited security and significantly affect BPU's prediction capability, leading to prominent performance degradation. Moreover,…
We describe some features of the A100 memory architecture. In particular, we give a technique to reverse-engineer some hardware layout information. Using this information, we show how to avoid TLB issues to obtain full-speed random HBM…
Exploring a substantial amount of unlabeled data, semi-supervised learning (SSL) boosts the recognition performance when only a limited number of labels are provided. However, traditional methods assume that the data distribution is…
Spectre v1 information disclosure attacks, which exploit CPU conditional branch misprediction, remain unsolved in deployed software. Certain Spectre v1 gadgets can be exploited only by out-of-place mistraining, in which the attacker…
Backdoor attacks (BA) are an emerging threat to deep neural network classifiers. A classifier being attacked will predict to the attacker's target class when a test sample from a source class is embedded with the backdoor pattern (BP).…
Motivated by the recent introduction and large-scale deployment of BBR congestion control algorithms, multiple studies have investigated the performance and fairness implications of this shift from loss-based to delay-based congestion…
Existing anti-malware software and reverse engineering toolkits struggle with stealthy sub-OS rootkits due to limitations of run-time kernel-level monitoring. A malicious kernel-level driver can bypass OS-level anti-virus mechanisms easily.…
Speculative execution is an optimization technique that has been part of CPUs for over a decade. It predicts the outcome and target of branch instructions to avoid stalling the execution pipeline. However, until recently, the security…
The demand for precise information on DRAM microarchitectures and error characteristics has surged, driven by the need to explore processing in memory, enhance reliability, and mitigate security vulnerability. Nonetheless, DRAM…