Related papers: A Loop-Based Methodology for Reducing Computationa…
To improve the utility of learning applications and render machine learning solutions feasible for complex applications, a substantial amount of heavy computations is needed. Thus, it is essential to delegate the computations among several…
Real-time scheduling algorithms proposed in the literature are often based on worst-case estimates of task parameters. The performance of an open-loop scheme can be degraded significantly if there are uncertainties in task parameters, such…
Modern software packages have become increasingly complex with millions of lines of code and references to many external libraries. Redundant operations are a common performance limiter in these code bases. Missed compiler optimization…
Optimizing schedules in real-world settings often requires considering workload constraints, specially for human resources, to ensure regulatory compliance, impose rest periods, or level the workload over the working horizon. This paper…
Multi-threaded programs have many applications which are widely used such as operating systems. Analyzing multi-threaded programs differs from sequential ones; the main feature is that many threads execute at the same time. The effect of…
Business process simulation is a versatile technique for analyzing business processes from a quantitative perspective. A well-known limitation of process simulation is that the accuracy of the simulation results is limited by the…
Reducing energy consumption is one of the key challenges in computing technology. One factor that contributes to high energy consumption is that all parts of the program are considered equally significant for the accuracy of the end-result.…
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…
The convergence of IoT, Edge, Cloud, and HPC technologies creates a compute continuum that merges cloud scalability and flexibility with HPC's computational power and specialized optimizations. However, integrating cloud and HPC resources…
Elasticity is highly desirable for stream processing systems to guarantee low latency against workload dynamics, such as surges in data arrival rate and fluctuations in data distribution. Existing systems achieve elasticity following a…
We present a technique to infer lower bounds on the worst-case runtime complexity of integer programs, where in contrast to earlier work, our approach is not restricted to tail-recursion. Our technique constructs symbolic representations of…
As multimodal and AI-driven services exchange hundreds of megabytes per request, existing IPC runtimes spend a growing share of CPU cycles on memory copies. Although both hardware and software mechanisms are exploring memory offloading,…
Looped computation shows promise in improving the reasoning-oriented performance of LLMs by scaling test-time compute. However, existing approaches typically require either training recurrent models from scratch or applying disruptive…
Parametric linear programming is a central operation for polyhedral computations, as well as in certain control applications.Here we propose a task-based scheme for parallelizing it, with quasi-linear speedup over large problems.This type…
Distributed computing systems implement redundancy to reduce the job completion time and variability. Despite a large body of work about computing redundancy, the analytical performance evaluation of redundancy techniques in queuing systems…
Scientific applications often contain large, computationally-intensive, and irregular parallel loops or tasks that exhibit stochastic characteristics. Applications may suffer from load imbalance during their execution on high-performance…
Compiler writers typically focus primarily on the performance of the generated program binaries when selecting the passes and the order in which they are applied in the standard optimization levels, such as GCC -O3. In some domains, such as…
Many shared-memory parallel irregular applications, such as sparse linear algebra and graph algorithms, depend on efficient loop scheduling (LS) in a fork-join manner despite that the work per loop iteration can greatly vary depending on…
Faced with saturation of Moore's law and increasing dimension of data, system designers have increasingly resorted to parallel and distributed computing. However, distributed computing is often bottle necked by a small fraction of slow…
Workflows provide an expressive programming model for fine-grained control of large-scale applications in distributed computing environments. Accurate estimates of complex workflow execution metrics on large-scale machines have several key…