Related papers: Correct Wrong Path
Under Windows operating system, existing I/O benchmarking tools does not allow a developer to efficiently define a file access strategy according to the applications' constraints. This is essentially due to the fact that the existing tools…
The direction of conditional branches is predicted correctly in modern processors with great accuracy. We find several instructions in the dynamic instruction stream that contribute only towards computing the condition of these…
In order to overcome the branch execution penalties of hard-to-predict instruction branches, two new instruction fetch micro-architectural methods are proposed in this paper. In addition, to compare performance of the two proposed methods,…
Modern processors use branch prediction and speculative execution to maximize performance. For example, if the destination of a branch depends on a memory value that is in the process of being read, CPUs will try guess the destination and…
In this paper we consider the problem of developing a computational model for emulating an RF channel. The motivation for this is that an accurate and scalable emulator has the potential to minimize the need for field testing, which is…
Lack of numerical precision in control software -- in particular, related to trajectory computation -- can lead to incorrect results with costly or even catastrophic consequences. Various tools have been proposed to analyze the precision of…
Microarchitecture simulators are indispensable tools for microarchitecture designers to validate, estimate, and optimize new hardware that meets specific design requirements. While the quest for a fast, accurate and detailed…
Accurate simulation techniques are indispensable to efficiently propose new memory or architectural organizations. As implementing new hardware concepts in real systems is often not feasible, cycle-accurate simulators employed together with…
Modern and future processors need to remain functionally correct in the presence of permanent faults to sustain scaling benefits and limit field returns. This paper presents a combined analytical and microarchitectural simulation-based…
Simulating quantum circuits on classical computers is a notoriously hard, yet increasingly important task for the development and testing of quantum algorithms. In order to alleviate this inherent complexity, efficient data structures and…
Accurate performance estimation of future many-node machines is challenging because it requires detailed simulation models of both node and network. However, simulating the full system in detail is unfeasible in terms of compute and memory…
The signature transform is a 'universal nonlinearity' on the space of continuous vector-valued paths, and has received attention for use in machine learning on time series. However, real-world temporal data is typically observed at discrete…
Understanding the behavior of software in execution is a key step in identifying and fixing performance issues. This is especially important in high performance computing contexts where even minor performance tweaks can translate into large…
Operationalizing AI has become a major endeavor in both research and industry. Automated, operationalized pipelines that manage the AI application lifecycle will form a significant part of tomorrow's infrastructure workloads. To optimize…
Finite-difference methods are widely used for zeroth-order optimization in settings where gradient information is unavailable or expensive to compute. These procedures mimic first-order strategies by approximating gradients through function…
In this paper, we report our ongoing investigations of the inherent non-determinism in contemporary execution environments that can potentially lead to divergence in state of a multi-channel hardware/software system. Our approach involved…
A recent advancement in quantum computing shows a quantum advantage of certified randomness on the racetrack processor. This work investigates the execution efficiency of this architecture for general-purpose programs. We first explore the…
Techniques to evaluate a program's cache performance fall into two camps: 1. Traditional trace-based cache simulators precisely account for sophisticated real-world cache models and support arbitrary workloads, but their runtime is…
Applications' performance is influenced by the mapping of processes to computing nodes, the frequency and volume of exchanges among processing elements, the network capacity, and the routing protocol. A poor mapping of application processes…
In order to design and implement tracers, one must decide what exactly to trace and how to produce this trace. On the one hand, trace designs are too often guided by implementation concerns and are not as useful as they should be. On the…