Related papers: Enhancing ASIC Technology Mapping via Parallel Sup…
Parallel processing of information plays a critical role in accelerating computation. This includes quantum computers, where parallel processing of quantum information will play a critical role in practical quantum advantage. Here, we…
Combinational equivalence checking (CEC) remains a challenge EDA task in the formal verification of datapath circuits due to their complex arithmetic structures and the limited capability or scalability of SAT, BDD, and exact-simulation…
To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple…
The Advanced Encryption Standard (AES) algorithm is a symmetric block cipher which operates on a sequence of blocks each consists of 128, 192 or 256 bits. Moreover, the cipher key for the AES algorithm is a sequence of 128, 192 or 256 bits.…
In the application of IC design for microprocessors, there are often demands for optimizing the implementation of datapath circuits, on which various arithmetic operations are performed. Combinational equivalence checking (CEC) plays an…
We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in…
Algorithm parallelization to leverage multi-core platforms for improving the efficiency of Electronic Design Automation~(EDA) tools plays a significant role in enhancing the scalability of Integrated Circuit (IC) designs. Logic optimization…
Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…
Together with the improvements in state-of-the-art accuracies of various tasks, deep learning models are getting significantly larger. However, it is extremely difficult to implement these large models because limited GPU memory makes it…
As quantum computers continue to improve and support larger, more complex computations, smart control hardware and compilers are needed to efficiently leverage the capabilities of these systems. This paper introduces a novel approach to…
The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…
Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…
The Dadda algorithm is a parallel structured multiplier, which is quite faster as compared to array multipliers, i.e., Booth, Braun, Baugh-Wooley, etc. However, it consumes more power and needs a larger number of gates for hardware…
Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their performances…
Anytime search algorithms are useful for planning problems where a solution is desired under a limited time budget. Anytime algorithms first aim to provide a feasible solution quickly and then attempt to improve it until the time budget…
Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…
Quality-Diversity (QD) optimization algorithms are a well-known approach to generate large collections of diverse and high-quality solutions. However, derived from evolutionary computation, QD algorithms are population-based methods which…
Parallel operations in conventional computing have proven to be an essential tool for efficient and practical computation, and the story is not different for quantum computing. Indeed, there exists a large body of works that study…
This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs), with the goal of improving…
It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…