CCSS: Hardware-Accelerated RTL Simulation with Fast Combinational Logic Computing and Sequential Logic Synchronization
Abstract
As transistor counts in a single chip exceed tens of billions, the complexity of RTL-level simulation and verification has grown exponentially, often extending simulation campaigns to several months. In industry practice, RTL simulation is divided into two phases: functional debug and system validation. While system validation demands high simulation speed and is typically accelerated using FPGAs, functional debug relies on rapid compilation-rendering multi-core CPUs the primary choice. However, the limited simulation speed of CPUs has become a major bottleneck. To address this challenge, we propose CCSS, a scalable multi-core RTL simulation platform that achieves both fast compilation and high simulation throughput. CCSS accelerates combinational logic computation and sequential logic synchronization through specialized architecture and compilation strategies. It employs a balanced DAG partitioning method and efficient boolean computation cores for combinational logic, and adopts a low-latency network-on-chip (NoC) design to synchronize sequential states across cores efficiently. Experimental results show that CCSS delivers up to 12.9x speedup over state-of-the-art multi-core simulators.
Cite
@article{arxiv.2507.08406,
title = {CCSS: Hardware-Accelerated RTL Simulation with Fast Combinational Logic Computing and Sequential Logic Synchronization},
author = {Weigang Feng and Yijia Zhang and Zekun Wang and Zhengyang Wang and Yi Wang and Peijun Ma and Ningyi Xu},
journal= {arXiv preprint arXiv:2507.08406},
year = {2025}
}
Comments
We plan to add more experiments and refine the figures in the paper. In addition, the overall structure needs significant revision to improve its readability