Related papers: Improved Basic Block Reordering
Modern software executes a large amount of code. Previous techniques of code layout optimization were developed one or two decades ago and have become inadequate to cope with the scale and complexity of new types of applications such as…
Matrix reordering permutes the rows and columns of a matrix to reveal meaningful visual patterns, such as blocks that represent clusters. A comprehensive collection of matrices, along with a scoring method for measuring the quality of…
Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance…
Finetuning can be used to tackle domain-specific tasks by transferring knowledge. Previous studies on finetuning focused on adapting only the weights of a task-specific classifier or re-optimizing all layers of the pre-trained model using…
Performance optimization for large-scale applications has recently become more important as computation continues to move towards data centers. Data-center applications are generally very large and complex, which makes code layout an…
Profile-guided optimizations rely on profile data for directing compilers to generate optimized code. To achieve the maximum performance boost, profile data needs to be collected on the same version of the binary that is being optimized. In…
The static instrumentation of machine code, also known as binary rewriting, is a power technique, but suffers from high runtime overhead compared to compiler-level instrumentation. Recent research has shown that tools can achieve…
Sequence models such as transformers require inputs to be represented as one-dimensional sequences. In vision, this typically involves flattening images using a fixed row-major (raster-scan) order. While full self-attention is…
This paper proposes ReBNet, an end-to-end framework for training reconfigurable binary neural networks on software and developing efficient accelerators for execution on FPGA. Binary neural networks offer an intriguing opportunity for…
Traditionally, query optimizers have been designed for computer systems that share a common architecture, consisting of a CPU, main memory and disk subsystem. The efficiency of query optimizers and their successful employment relied on the…
Network binarization emerges as one of the most promising compression approaches offering extraordinary computation and memory savings by minimizing the bit-width. However, recent research has shown that applying existing binarization…
In this overview article we will consider the deliberate restarting of algorithms, a meta technique, in order to improve the algorithm's performance, e.g., convergence rates or approximation guarantees. One of the major advantages is that…
Sparse matrix ordering is a vital optimization technique often employed for solving large-scale sparse matrices. Its goal is to minimize the matrix bandwidth by reorganizing its rows and columns, thus enhancing efficiency. Conventional…
The order statistics based list decoding techniques for linear binary block codes of small to medium block length are investigated. The construction of the list of the test error patterns is considered. The original order statistics…
This paper presents the key algorithmic techniques behind CatBoost, a new gradient boosting toolkit. Their combination leads to CatBoost outperforming other publicly available boosting implementations in terms of quality on a variety of…
Balanced graph partitioning is a critical step for many large-scale distributed computations with relational data. As graph datasets have grown in size and density, a range of highly-scalable balanced partitioning algorithms have appeared…
Binary optimization, a representative subclass of discrete optimization, plays an important role in mathematical optimization and has various applications in computer vision and machine learning. Usually, binary optimization problems are…
In the search for highly efficient decoders for short LDPC codes approaching maximum likelihood performance, a relayed decoding strategy, specifically activating the ordered statistics decoding process upon failure of a neural min-sum…
In this paper, we surveyed the existing literature studying different approaches and algorithms for the four critical components in the general branch and bound (B&B) algorithm, namely, branching variable selection, node selection, node…
Memory and computation efficient deep learning architec- tures are crucial to continued proliferation of machine learning capabili- ties to new platforms and systems. Binarization of operations in convo- lutional neural networks has shown…