English
Related papers

Related papers: Exposing Shadow Branches

200 papers

Prior work has observed that fetch-directed prefetching (FDIP) is highly effective at covering instruction cache misses. The key to FDIP's effectiveness is having a sufficiently large BTB to accommodate the application's branch working set.…

Hardware Architecture · Computer Science 2020-06-25 Truls Asheim , Rakesh Kumar , Boris Grot

High-performance branch target buffers (BTBs) and the L1I cache are key to high-performance front-end. Modern branch predictors are highly accurate, but with an increase in code footprint in modern-day server workloads, BTB and L1I misses…

Hardware Architecture · Computer Science 2021-07-06 Vishal Gupta , Biswabandan Panda

Many contemporary applications feature multi-megabyte instruction footprints that overwhelm the capacity of branch target buffers (BTB) and instruction caches (L1-I), causing frequent front-end stalls that inevitably hurt performance. BTB…

Hardware Architecture · Computer Science 2023-01-11 Truls Asheim , Boris Grot , Rakesh Kumar

Modern processors have suffered a deluge of threats exploiting branch instruction collisions inside the branch prediction unit (BPU), from eavesdropping on secret-related branch operations to triggering malicious speculative executions.…

Cryptography and Security · Computer Science 2022-04-22 Tao Zhang , Timothy Lesch , Kenneth Koltermann , Dmitry Evtyushkin

Load-Dependent Branches (LDB) often do not exhibit regular patterns in their local or global history and thus are inherently hard to predict correctly by conventional branch predictors. We propose a software-to-hardware branch…

Hardware Architecture · Computer Science 2023-06-13 Maziar Goudarzi , Reza Azimi , Julian Humecki , Faizaan Rehman , Richard Zhang , Chirag Sethi , Tanishq Bomman , Yuqi Yang

Modern out-of-order CPUs heavily rely on speculative execution for performance optimization, with branch prediction serving as a cornerstone to minimize stalls and maximize efficiency. Whenever shared branch prediction resources lack proper…

Cryptography and Security · Computer Science 2025-06-10 Yuhui Zhu , Alessandro Biondi

Efficiency in instruction fetching is critical to performance, and this requires the primary structures--L1 instruction caches (L1i), branch target buffers (BTB) and instruction TLBs (iTLB)--to have the requisite information when needed.…

Hardware Architecture · Computer Science 2026-04-02 Shyam Murthy , Gurindar S. Sohi

Modern processors rely heavily on speculation to keep the pipeline filled and consequently execute and commit instructions as close to maximum capacity as possible. To improve instruction-level parallelism, the processor core needs to fetch…

Hardware Architecture · Computer Science 2021-10-19 Ilias Vougioukas , Andreas Sandberg , Nikos Nikoleris

Modern storage systems intensively utilize data prefetching algorithms while processing sequences of the read requests. Performance of the prefetching algorithm (for instance increase of the cache hit ratio of the cache system - CHR)…

Databases · Computer Science 2024-06-14 Vadim Voevodkin , Andrey Sokolov

Recently, speculative decoding (SD) has emerged as a promising technique to accelerate LLM inference by employing a small draft model to propose draft tokens in advance, and validating them in parallel with the large target model. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-15 Yuhao Shen , Junyi Shen , Quan Kong , Tianyu Liu , Yao Lu , Cong Wang

Transient execution attacks that exploit speculation have raised significant concerns in computer systems. Typically, branch predictors are leveraged to trigger mis-speculation in transient execution attacks. In this work, we demonstrate a…

Cryptography and Security · Computer Science 2021-11-02 Md Hafizul Islam Chowdhuryy , Fan Yao

L1 instruction (L1-I) cache misses are a source of performance bottleneck. Sequential prefetchers are simple solutions to mitigate this problem; however, prior work has shown that these prefetchers leave considerable potentials uncovered.…

Hardware Architecture · Computer Science 2021-02-04 Ali Ansari , Fatemeh Golshan , Pejman Lotfi-Kamran , Hamid Sarbazi-Azad

Feature caching has recently emerged as a promising method for diffusion model acceleration. It effectively alleviates the inefficiency problem caused by high computational requirements by caching similar features in the inference process…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Jiayi Pan , Jiaming Xu , Yongkang Zhou , Guohao Dai

Branch predictor (BP) is an essential component in modern processors since high BP accuracy can improve performance and reduce energy by decreasing the number of instructions executed on wrong-path. However, reducing latency and storage…

Hardware Architecture · Computer Science 2018-04-03 Sparsh Mittal

Branch prediction is key to the performance of out-of-order processors. While the CBP-2016 winner TAGE-SC-L combines geometric-history tables, a statistical corrector, and a loop predictor, over half of its remaining mispredictions stem…

Hardware Architecture · Computer Science 2025-06-10 Emet Behrendt , Shing Wai Pun , Prashant J. Nair

The growing memory footprints of cloud and big data applications mean that data center CPUs can spend significant time waiting for memory. An attractive approach to improving performance in such centralized compute settings is to employ…

Hardware Architecture · Computer Science 2020-09-02 Karthik Sankaranarayanan , Chit-Kwan Lin , Gautham Chinya

Cache prefetcher greatly eliminates compulsory cache misses, by fetching data from slower memory to faster cache before it is actually required by processors. Sophisticated prefetchers predict next use cache line by repeating program's…

Hardware Architecture · Computer Science 2017-12-05 Haoyuan Wang , Zhiwei Luo

FDTD codes, such as Sophie developed at CEA/DAM, no longer take advantage of the processor's increased computing power, especially recently with the raising multicore technology. This is rooted in the fact that low order numerical schemes…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-01-22 Olivier Cessenat

Large-scale networked services rely on deep soft-ware stacks and microservice orchestration, which increase instruction footprints and create frontend stalls that inflate tail latency and energy. We revisit instruction prefetching for these…

Machine Learning · Computer Science 2025-11-26 Zerui Bao , Di Zhu , Liu Jiang , Shiqi Sheng , Ziwei Wang , Haoyun Zhang

Modern processor designs use a variety of microarchitectural methods to achieve high performance. Unfortunately, new side-channels have often been uncovered that exploit these enhanced designs. One area that has received little attention…

Cryptography and Security · Computer Science 2021-09-02 Yun Chen , Lingfeng Pei , Trevor E. Carlson
‹ Prev 1 2 3 10 Next ›