Related papers: Guided Optimization for Image Processing Pipelines
We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or…
Guided upsampling is an effective approach for accelerating high-resolution image processing. In this paper, we propose a simple yet effective guided upsampling method. Each pixel in the high-resolution image is represented as a linear…
The HaliVer tool integrates deductive verification into the popular scheduling language Halide, used for image processing pipelines and array computations. HaliVer uses Vercors, a separation logic-based verifier, to verify the correctness…
Efficient optimisation algorithms have become important tools for finding high-quality solutions to hard, real-world problems such as production scheduling, timetabling, or vehicle routing. These algorithms are typically "black boxes" that…
Specialized image processing accelerators are necessary to deliver the performance and energy efficiency required by important applications in computer vision, computational photography, and augmented reality. But creating,…
Profile guided optimization is an effective technique for improving the optimization ability of compilers based on dynamic behavior, but collecting profile data is expensive, cumbersome, and requires regular updating to remain fresh. We…
Profile Guided Optimization (PGO) uses runtime profiling to direct compiler optimization decisions, effectively combining static analysis with actual execution behavior to enhance performance. Runtime profiles, collected through…
Code runtime optimization-the task of rewriting a given code to a faster one-remains challenging, as it requires reasoning about performance trade-offs involving algorithmic and structural choices. Recent approaches employ code-LLMs with…
Text-to-image generation has evolved beyond single monolithic models to complex multi-component pipelines. These combine fine-tuned generators, adapters, upscaling blocks and even editing steps, leading to significant improvements in image…
Nonlinear programming targets nonlinear optimization with constraints, which is a generic yet complex methodology involving humans for problem modeling and algorithms for problem solving. We address the particularly hard challenge of…
As machine learning techniques become ubiquitous, the efficiency of neural network implementations is becoming correspondingly paramount. Frameworks, such as Halide and TVM, separate out the algorithmic representation of the network from…
Biologically inspired, from the early HMAX model to Spatial Pyramid Matching, pooling has played an important role in visual recognition pipelines. Spatial pooling, by grouping of local codes, equips these methods with a certain degree of…
Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for many applications. The predominantly used imperative languages - like C or OpenCL - force the programmer to intertwine the code describing…
Software Pipelining is a classic and important loop-optimization for VLIW processors. It improves instruction-level parallelism by overlapping multiple iterations of a loop and executing them in parallel. Typically, it is implemented using…
We present Piko, a framework for designing, optimizing, and retargeting implementations of graphics pipelines on multiple architectures. Piko programmers express a graphics pipeline by organizing the computation within each stage into…
The polyhedral model provides a powerful mathematical abstraction to enable effective optimization of loop nests with respect to a given optimization goal, e.g., exploiting parallelism. Unexploited reduction properties are a frequent reason…
High-level synthesis (HLS) allows hardware designers to create hardware designs with high-level programming languages like C/C++/OpenCL, which greatly improves hardware design productivity. However, existing HLS flows require programmers'…
After completing the design and training phases, deploying a deep learning model onto specific hardware is essential before practical implementation. Targeted optimizations are necessary to enhance the model's performance by reducing…
Code often suffers from performance bugs. These bugs necessitate the research and practice of code optimization. Traditional rule-based methods rely on manually designing and maintaining rules for specific performance bugs (e.g., redundant…
Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the…