Related papers: A Register Allocation Algorithm in the Presence of…
In a compiler, an essential component is the register allocator. Two main algorithms have dominated implementations, graph coloring and linear scan, differing in how live values are modeled. Graph coloring uses an edge in an `interference…
Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a…
Register allocation has long been formulated as a graph coloring problem, coloring the conflict graph with physical registers. Such a formulation does not fully capture the goal of the allocation, which is to minimize the traffic between…
The use and location of memory in integrated circuits plays a key factor in their performance. Memory requires large physical area, access times limit overall system performance and connectivity can result in large fan-out. Modern FPGA…
Arising disruptive memory technologies continuously make their way into the memory hierarchy at various levels. Racetrack memory is one promising candidate for future memory due to the overall low energy consumption, access latency and high…
Bringing high-level machine learning models to efficient and well-suited machine implementations often invokes a bunch of tools, e.g.~code generators, compilers, and optimizers. Along such tool chains, abstractions have to be applied. This…
This paper proposes an application mapping algorithm, BandMap, for coarse-grained reconfigurable array (CGRA), which allocates the bandwidth in PE array according to the transferring demands of data, especially the data with high spatial…
In engineering applications sorting is an important and widely studied problem where execution speed and resources used for computation are of extreme importance, especially if we think about real time data processing. Most of the…
In this paper, the acceleration of algorithms using a design of a field programmable gate array (FPGA) as a prototype of a static dataflow architecture is discussed. The static dataflow architecture using operators interconnected by…
Register allocation is a much studied problem. A particularly important context for optimizing register allocation is within loops, since a significant fraction of the execution time of programs is often inside loop code. A variety of…
Data structures are a cornerstone of most modern programming languages. Whether they are provided via separate libraries, built into the language specification, or as part of the language's standard library -- data structures such as lists,…
Efficient and real time segmentation of color images has a variety of importance in many fields of computer vision such as image compression, medical imaging, mapping and autonomous navigation. Being one of the most computationally…
Coarse-grained reconfigurable architectures aim to achieve both goals of high performance and flexibility. However, existing reconfigurable array architectures require many resources without considering the specific application domain.…
Applications making excessive use of single-object based data structures (such as linked lists, trees, etc...) can see a drop in efficiency over a period of time due to the randomization of nodes in memory. This slow down is due to the…
Domain-specific accelerators are used in various computing systems ranging from edge devices to data centers. Coarse-grained reconfigurable arrays (CGRAs) represent an architectural midpoint between the flexibility of an FPGA and the…
We study the problem of executing an application represented by a precedence task graph on a parallel machine composed of standard computing cores and accelerators. Contrary to most existing approaches, we distinguish the allocation and the…
Registers are the fastest memory components within the GPU's complex memory hierarchy, accessed by names rather than addresses. They are managed entirely by the compiler through a process called register allocation, during which the…
The architecture of a coarse-grained reconfigurable array (CGRA) processing element (PE) has a significant effect on the performance and energy efficiency of an application running on the CGRA. This paper presents an automated approach for…
Genetic Algorithms (GAs) are used to solve search and optimization problems in which an optimal solution can be found using an iterative process with probabilistic and non-deterministic transitions. However, depending on the problem's…
Coarse-Grained Reconfigurable Arrays (CGRAs) are specialized accelerators commonly employed to boost performance in workloads with iterative structures. Existing research typically focuses on compiler or architecture optimizations aimed at…