Related papers: Enhancing ASIC Technology Mapping via Parallel Sup…

Arbitrary parallel entangling gates with independent calibration on a trapped ion quantum computer

Parallel processing of information plays a critical role in accelerating computation. This includes quantum computers, where parallel processing of quantum information will play a critical role in practical quantum advantage. Here, we…

Quantum Physics · Physics 2026-04-30 Matthew Diaz , Masoud Mohammadi-Arzanagh , Yingyue Zhu , Mohammad Hafezi , Norbert M. Linke , Alaina M. Green , Arthur Y. Nam

FastLEC: Parallel Datapath Equivalence Checking with Hybrid Engines

Combinational equivalence checking (CEC) remains a challenge EDA task in the formal verification of datapath circuits due to their complex arithmetic structures and the limited capability or scalability of SAT, BDD, and exact-simulation…

Logic in Computer Science · Computer Science 2025-12-09 Xindi Zhang , Furong Ye , Zhihan Chen , Shaowei Cai

Efficient Pipeline Planning for Expedited Distributed DNN Training

To train modern large DNN models, pipeline parallelism has recently emerged, which distributes the model across GPUs and enables different devices to process different microbatches in pipeline. Earlier pipeline designs allow multiple…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-23 Ziyue Luo , Xiaodong Yi , Guoping Long , Shiqing Fan , Chuan Wu , Jun Yang , Wei Lin

A Design of a Fast Parallel-Pipelined Implementation of AES: Advanced Encryption Standard

The Advanced Encryption Standard (AES) algorithm is a symmetric block cipher which operates on a sequence of blocks each consists of 128, 192 or 256 bits. Moreover, the cipher key for the AES algorithm is a sequence of 128, 192 or 256 bits.…

Cryptography and Security · Computer Science 2015-01-08 Ghada F. Elkabbany , Heba K. Aslan , Mohamed N. Rasslan

Datapath Combinational Equivalence Checking With Hybrid Sweeping Engines and Parallelization

In the application of IC design for microprocessors, there are often demands for optimizing the implementation of datapath circuits, on which various arithmetic operations are performed. Combinational equivalence checking (CEC) plays an…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-28 Zhihan Chen , Xindi Zhang , Yuhang Qian , Shaowei Cai

ASAGA: Asynchronous Parallel SAGA

We describe ASAGA, an asynchronous parallel version of the incremental gradient algorithm SAGA that enjoys fast linear convergence rates. Through a novel perspective, we revisit and clarify a subtle but important technical issue present in…

Optimization and Control · Mathematics 2017-11-09 Rémi Leblond , Fabian Pedregosa , Simon Lacoste-Julien

Parallel AIG Refactoring via Conflict Breaking

Algorithm parallelization to leverage multi-core platforms for improving the efficiency of Electronic Design Automation~(EDA) tools plays a significant role in enhancing the scalability of Integrated Circuit (IC) designs. Logic optimization…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-23 Ye Cai , Zonglin Yang , Liwei Ni , Junfeng Liu , Biwei Xie , Xingquan Li

Guided parallelized stochastic gradient descent for delay compensation

Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to its…

Machine Learning · Computer Science 2024-02-13 Anuraganand Sharma

Tesseract: Parallelize the Tensor Parallelism Efficiently

Together with the improvements in state-of-the-art accuracies of various tasks, deep learning models are getting significantly larger. However, it is extremely difficult to implement these large models because limited GPU memory makes it…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-02 Boxiang Wang , Qifan Xu , Zhengda Bian , Yang You

Parallelizing Program Execution on Distributed Quantum Systems via Compiler/Hardware Co-Design

As quantum computers continue to improve and support larger, more complex computations, smart control hardware and compilers are needed to efficiently leverage the capabilities of these systems. This paper introduces a novel approach to…

Quantum Physics · Physics 2025-11-19 Folkert de Ronde , Alexander Knapen , Stephan Wong , Sebastian Feld

Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates

The increasing size of deep learning models has made distributed training across multiple devices essential. However, current methods such as distributed data-parallel training suffer from large communication and synchronization overheads…

Machine Learning · Computer Science 2025-02-10 Cabrel Teguemne Fokam , Khaleelulla Khan Nazeer , Lukas König , David Kappel , Anand Subramoney

Parallel Combining: Benefits of Explicit Synchronization

Parallel batched data structures are designed to process synchronized batches of operations in a parallel computing model. In this paper, we propose parallel combining, a technique that implements a concurrent data structure from a parallel…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-11-14 Vitaly Aksenov , Petr Kuznetsov , Anatoly Shalyto

Area, Delay, and Energy-Efficient Full Dadda Multiplier

The Dadda algorithm is a parallel structured multiplier, which is quite faster as compared to array multipliers, i.e., Booth, Braun, Baugh-Wooley, etc. However, it consumes more power and needs a larger number of gates for hardware…

Systems and Control · Electrical Eng. & Systems 2023-07-13 Muteen Munawar , Zain Shabbir , Muhammad Akram

A Simple and Efficient Approach to Batch Bayesian Optimization

Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their performances…

Machine Learning · Computer Science 2025-04-25 Dawei Zhan , Zhaoxi Zeng , Shuoxiao Wei , Ping Wu

A-ePA*SE: Anytime Edge-Based Parallel A* for Slow Evaluations

Anytime search algorithms are useful for planning problems where a solution is desired under a limited time budget. Anytime algorithms first aim to provide a feasible solution quickly and then attempt to improve it until the time budget…

Artificial Intelligence · Computer Science 2023-05-09 Hanlan Yang , Shohin Mukherjee , Maxim Likhachev

Automatic Parallelization of Sequential Programs

Prior work on Automatically Scalable Computation (ASC) suggests that it is possible to parallelize sequential computation by building a model of whole-program execution, using that model to predict future computations, and then…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-09-21 Peter Kraft , Amos Waterland , Daniel Y Fu , Anitha Gollamudi , Shai Szulanski , Margo Seltzer

Accelerated Quality-Diversity through Massive Parallelism

Quality-Diversity (QD) optimization algorithms are a well-known approach to generate large collections of diverse and high-quality solutions. However, derived from evolutionary computation, QD algorithms are population-based methods which…

Neural and Evolutionary Computing · Computer Science 2022-10-11 Bryan Lim , Maxime Allard , Luca Grillotti , Antoine Cully

Efficient quantum programming using EASE gates on a trapped-ion quantum computer

Parallel operations in conventional computing have proven to be an essential tool for efficient and practical computation, and the story is not different for quantum computing. Indeed, there exists a large body of works that study…

Quantum Physics · Physics 2022-02-02 Nikodem Grzesiak , Andrii Maksymov , Pradeep Niroula , Yunseong Nam

Temporal parallelisation of continuous-time maximum-a-posteriori trajectory estimation

This paper proposes a parallel-in-time method for computing continuous-time maximum-a-posteriori (MAP) trajectory estimates of the states of partially observed stochastic differential equations (SDEs), with the goal of improving…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-16 Hassan Razavi , Ángel F. García-Fernández , Simo Särkkä

DAPPLE: A Pipelined Data Parallel Approach for Training Large Models

It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-03 Shiqing Fan , Yi Rong , Chen Meng , Zongyan Cao , Siyu Wang , Zhen Zheng , Chuan Wu , Guoping Long , Jun Yang , Lixue Xia , Lansong Diao , Xiaoyong Liu , Wei Lin