Related papers: A Method for Efficient Heterogeneous Parallel Comp…
Heterogeneous many-cores are now an integral part of modern computing systems ranging from embedding systems to supercomputers. While heterogeneous many-core design offers the potential for energy-efficient high-performance, such potential…
In recent times, the emergence of Large Language Models (LLMs) has resulted in increasingly larger model size, posing challenges for inference on low-resource devices. Prior approaches have explored offloading to facilitate low-memory…
Recent advancements in large language models (LLMs) necessitate extensive computational resources, prompting the use of diverse hardware accelerators from multiple vendors. However, traditional distributed training frameworks struggle to…
Modern computing systems increasingly rely on composing heterogeneous devices to improve performance and efficiency. Programming these systems is often unproductive: algorithm implementations must be coupled to system-specific logic,…
Heterogeneous multi core processors can offer diverse computing capabilities. The efficiency of Market Basket Analysis Algorithm can be improved with heterogeneous multi core processors. Market basket analysis algorithm utilises apriori…
The emergence of machine learning, image and audio processing on edge devices has motivated research towards power efficient custom hardware accelerators. Though FPGAs are an ideal target for energy efficient custom accelerators, the…
This paper addresses the problem of parallelizing computations to study non-linear dynamics in large networks of non-locally coupled oscillators using heterogeneous computing resources. The proposed approach can be applied to a variety of…
A new generation of manycore processors is on the rise that offers dozens and more cores on a chip and, in a sense, fuses host processor and accelerator. In this paper we target the efficient training of generalized linear models on these…
To increase performance and efficiency, systems use FPGAs as reconfigurable accelerators. A key challenge in designing these systems is partitioning computation between processors and an FPGA. An appropriate division of labor may be…
Homomorphic encryption (HE) enables arithmetic operations to be performed directly on encrypted data. It is essential for privacy-preserving applications such as machine learning, medical diagnosis, and financial data analysis. In popular…
Heterogeneity is omnipresent in today's commodity computational systems, which comprise at least one multi-core Central Processing Unit (CPU) and one Graphics Processing Unit (GPU). Nonetheless, all this computing power is not being…
In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate…
Hyperdimensional Computing (HDC) is a bio-inspired computing framework that has gained increasing attention, especially as a more efficient approach to machine learning (ML). This work introduces the \name{} compiler, the first open-source…
In large-scale distributed computing clusters, such as Amazon EC2, there are several types of "system noise" that can result in major degradation of performance: bottlenecks due to limited communication bandwidth, latency due to straggler…
To take full advantage of a specific hardware target, performance engineers need to gain control on compilers in order to leverage their domain knowledge about the program and hardware. Yet, modern compilers are poorly controlled, usually…
As quantum computers continue to improve and support larger, more complex computations, smart control hardware and compilers are needed to efficiently leverage the capabilities of these systems. This paper introduces a novel approach to…
This thesis (extended abstract) presents the software development efforts toward efficient exploitation of heterogeneity through intricate mapping of computational kernels, collaborative execution of multiple processing elements and…
AI kernel compilation for edge devices depends on the compiler's ability to exploit parallelism and hide memory latency in the presence of hierarchical memory and explicit data movement. This paper reports a benchmark methodology and…
Modern research in code generators for dense linear algebra computations has shown the ability to produce optimized code with a performance which compares and often exceeds the one of state-of-the-art implementations by domain experts.…
This work presents MLIR, a novel approach to building reusable and extensible compiler infrastructure. MLIR aims to address software fragmentation, improve compilation for heterogeneous hardware, significantly reduce the cost of building…