Related papers: Recursive double-size fixed precision arithmetic
In this paper, we propose an architecture/methodology for making FPGAs suitable for integer as well as variable precision floating point multiplication. The proposed work will of great importance in applications which requires variable…
We have a FPGA design, we make it fast, efficient, and tested for a few important examples. Now we must infer a general solution to deploy in the data center. Here, we describe the FPGA DPUV3INT8 design and our compiler effort. The…
Test-time scaling methods improve the capabilities of large language models (LLMs) by increasing the amount of compute used during inference to make a prediction. Inference-time compute can be scaled in parallel by choosing among multiple…
We develop a framework for resource efficient compilation of higher-level programs into lower-level reversible circuits. Our main focus is on optimizing the memory footprint of the resulting reversible networks. This is motivated by the…
We describe a new C++ library for multiprecision arithmetic for numbers in the order of 100--500 bits, i.e., representable with just a few limbs. The library is written in "optimizing-compiler-friendly" C++, with an emphasis on the use of…
Complex machine learning (ML) inference algorithms like recurrent neural networks (RNNs) use standard functions from math libraries like exponentiation, sigmoid, tanh, and reciprocal of square root. Although prior work on secure 2-party…
The article deals with a kind of recursive function templates in C++, where the recursion is realized corresponding template parameters to achieve better computational performance. Some specialization of these template functions ends the…
To improve the efficiency of Gaussian integral evaluation on modern accelerated architectures FLOP-efficient Obara-Saika-based recursive evaluation schemes are optimized for the memory footprint. For the 3-center 2-particle integrals that…
The integral image, an intermediate image representation, has found extensive use in multi-scale local feature detection algorithms, such as Speeded-Up Robust Features (SURF), allowing fast computation of rectangular features at constant…
In recent years, quantum Ising machines have drawn a lot of attention, but due to physical implementation constraints, it has been difficult to achieve dense coupling, such as full coupling with sufficient spins to handle practical…
Computing-in-memory (CIM) has attracted significant attentions in recent years due to its massive parallelism and low power consumption. However, current CIM designs suffer from large area overhead of small CIM macros and bad programmablity…
The multi-resolution approximation (MRA) of Gaussian processes was recently proposed to conduct likelihood-based inference for massive spatial data sets. An advantage of the methodology is that it can be parallelized. We implemented the MRA…
We propose a novel floating-point encoding scheme that builds on prior work involving fixed-point encodings. We encode floating-point numbers using Two's Complement fixed-point mantissas and Two's Complement integral exponents. We used our…
A novel fast recursive coding technique is proposed. It operates with only integer values not longer 8 bits and is multiplication free. Recursion the algorithm is based on indirectly provides rather effective coding of symbols for very…
Recent work showed neural-network-based approaches to reconstructing images from compressively sensed measurements offer significant improvements in accuracy and signal compression. Such methods can dramatically boost the capability of…
Converting binary integers to variable-length decimal strings is a fundamental operation in computing. Conventional fast approaches rely on recursive division and small lookup tables. We propose a SIMD-based algorithm that leverages integer…
Context: Many systems require receiving data from multiple information sources, which act as distributed network devices that asynchronously send the latest data at their own pace to generalize various kinds of devices and connections,…
This work studies the recursive robust principal components' analysis(PCA) problem. Here, "robust" refers to robustness to both independent and correlated sparse outliers. If the outlier is the signal-of-interest, this problem can be…
Iterative solvers are frequently used in scientific applications and engineering computations. However, the memory-bound Sparse Matrix-Vector (SpMV) kernel computation hinders the efficiency of iterative algorithms. As modern hardware…
Inspired by recent findings on the fractal geometry of language, we introduce Recursive INference Scaling (RINS) as a complementary, plug-in recipe for scaling inference time in language and multimodal systems. RINS is a particular form of…