Related papers: Computing in matrix groups without memory
Memoryless computation is a new technique to compute any function of a set of registers by updating one register at a time while using no memory. Its aim is to emulate how computations are performed in modern cores, since they typically…
In this paper, we are interested in memoryless computation, a modern paradigm to compute functions which generalises the famous XOR swap algorithm to exchange the contents of two variables without using a buffer. This uses a combinatorial…
Registers are the fastest memory components within the GPU's complex memory hierarchy, accessed by names rather than addresses. They are managed entirely by the compiler through a process called register allocation, during which the…
Multiplying matrices is among the most fundamental and compute-intensive operations in machine learning. Consequently, there has been significant work on efficiently approximating matrix multiplies. We introduce a learning-based algorithm…
The biggest cost of computing with large matrices in any modern computer is related to memory latency and bandwidth. The average latency of modern RAM reads is 150 times greater than a clock step of the processor. Throughput is a little…
We introduce the new concept of computation coding. Similar to how rate-distortion theory is concerned with the lossy compression of data, computation coding deals with the lossy computation of functions. Particularizing to linear…
In this technical report, a new formulation for embedding a neural network into an optimization model is described. This formulation does not require binary variables to properly compute the output of the neural network for specific types…
Optimal usage of the memory system is a key element of fast GPU algorithms. Unfortunately many common algorithms fail in this regard despite exhibiting great regularity in memory access patterns. In this paper we propose efficient kernels…
Research efforts of the past fifty years have led to a development of linear integer programming as a mature discipline of mathematical optimization. Such a level of maturity has not been reached when one considers nonlinear systems subject…
We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix…
Permutation resemblance measures the distance of a function from being a permutation. Here we show how to determine the permutation resemblance through linear integer programming techniques. We also present an algorithm for constructing…
We describe a generalization of the concept of a pc presentation that applies to groups with a nontrivial solvable radical. Such a representation can be much more efficient in terms of memory use and even of arithmetic, than permuattion and…
This paper describes a new approach, based on linear programming, for computing nonnegative matrix factorizations (NMFs). The key idea is a data-driven model for the factorization where the most salient features in the data are used to…
Many empirical studies have demonstrated the performance benefits of conditional computation in neural networks, including reduced inference time and power consumption. We study the fundamental limits of neural conditional computation from…
Integer linear programming (ILP) encompasses a very important class of optimization problems that are of great interest to both academia and industry. Several algorithms are available that attempt to explore the solution space of this class…
In order to reduce the computational complexity of large language models, great efforts have been made to to improve the efficiency of transformer models such as linear attention and flash-attention. However, the model size and…
The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power…
In the literature of algorithms, the specific computation model is often not explicit as it is assumed that the model of computation is the RAM (Random Access Machine) model. However, the RAM model itself is ill-founded in the literature,…
The computation of matrix functions is a well-studied problem. Of special importance are the exponential and the logarithm of a matrix, where the latter also raises existence and uniqueness questions. This is particularly relevant in the…
Memory refinements are designed below to detect those sequences of actions that have been repeated a given number n. Subsequently such sequences are permitted to run without CPU involvement. This mimics human learning. Actions are rehearsed…