Related papers: A Tiling Perspective for Register Optimization
Register allocation (mapping variables to processor registers or memory) and instruction scheduling (reordering instructions to increase instruction-level parallelism) are essential tasks for generating efficient assembly code in a…
This paper introduces a combinatorial optimization approach to register allocation and instruction scheduling, two central compiler problems. Combinatorial optimization has the potential to solve these problems optimally and to exploit…
Register allocation is one of the most important problems for modern compilers. With a practically unlimited number of user variables and a small number of CPU registers, assigning variables to registers without conflicts is a complex task.…
Mining and exploring databases should provide users with knowledge and new insights. Tiles of data strive to unveil true underlying structure and distinguish valuable information from various kinds of noise. We propose a novel Boolean…
The aggressive application of scalar replacement to array references substantially reduces the number of memory operations at the expense of a possibly very large number of registers. In this paper we describe a register allocation…
Given a basic block of instructions, finding a schedule that requires the minimum number of registers for evaluation is a well-known problem. The problem is NP-complete when the dependences among instructions form a directed-acyclic graph…
In a compiler, an essential component is the register allocator. Two main algorithms have dominated implementations, graph coloring and linear scan, differing in how live values are modeled. Graph coloring uses an edge in an `interference…
Partitioning large matrices is an important problem in distributed linear algebra computing (used in ML among others). Briefly, our goal is to perform a sequence of matrix algebra operations in a distributed manner (whenever possible) on…
In modern digital circuit back-end design, designers heavily rely on electronic-design-automoation (EDA) tool to close timing. However, the heuristic algorithms used in the place and route tool usually does not result in optimal solution.…
Tiering is an essential technique for building large-scale information retrieval systems. While the selection of documents for high priority tiers critically impacts the efficiency of tiering, past work focuses on optimizing it with respect…
During compilation of a program, register allocation is the task of mapping program variables to machine registers. During register allocation, the compiler may introduce shuffle code, consisting of copy and swap operations, that transfers…
Placement Optimization is an important problem in systems and chip design, which consists of mapping the nodes of a graph onto a limited set of resources to optimize for an objective, subject to constraints. In this paper, we start by…
Optimization pipelines targeting polyhedral programs try to maximize the compute throughput. Traditional approaches favor reuse and temporal locality; while the communicated volume can be low, failure to optimize spatial locality may cause…
Reducing communication - either between levels of a memory hierarchy or between processors over a network - is a key component of performance optimization (in both time and energy) for many problems, including dense linear algebra, particle…
Bringing high-level machine learning models to efficient and well-suited machine implementations often invokes a bunch of tools, e.g.~code generators, compilers, and optimizers. Along such tool chains, abstractions have to be applied. This…
In the Tutor Allocation Problem, the objective is to assign a set of tutors to a set of workshops in order to maximize tutors' preferences. The problem is solved every year by many universities, each having its own specific set of…
Finding optimal configurations in a geometric space is a key challenge in many technological disciplines. Current approaches either rely heavily on human domain expertise and are difficult to scale. In this paper we show it is possible to…
This thesis addresses the complexities of compiler optimizations, such as register allocation and Lifetime-optimal Speculative Partial Redundancy Elimination (LOSPRE), which are often handled using tree decomposition algorithms. However,…
In the context of mapping high-level algorithms to hardware, we consider the basic problem of generating an efficient hardware implementation of a single threaded program, in particular, that of an inner loop. We describe a control-flow…
Computations, where the number of results is much smaller than the input data and are produced through some sort of accumulation, are called Reductions. Reductions appear in many scientific applications. Usually, reductions admit an…