Related papers: A Prediction Packetizing Scheme for Reducing Chann…
Rapid design space exploration in early design stage is critical to algorithm-architecture co-design for accelerators. In this work, a pre-RTL cycle-accurate accelerator simulator based on SystemC transaction-level modeling (TLM),…
This paper describes the parallel implementation of the TRANSIMS traffic micro-simulation. The parallelization method is domain decomposition, which means that each CPU of the parallel computer is responsible for a different geographical…
Owing to the rapid growth number of vehicles, urban traffic congestion has become more and more severe in the last decades. As an effective approach, Model Predictive Control (MPC) has been applied to urban traffic signal control system.…
Traffic simulation software is becoming increasingly popular as more cities worldwide use it to better manage their crowded traffic networks. An important requirement for such software is the ability to produce accurate results in real…
In this work, we optimize speculative sampling for parallel hardware accelerators to improve sampling speed. We notice that substantial portions of the intermediate matrices necessary for speculative sampling can be computed concurrently.…
This work introduces an integrated approach to optimizing urban traffic by combining predictive modeling of vehicle flow, adaptive traffic signal control, and a modular integration architecture through distributed messaging. Using real-time…
The present paper deals with the problem of improving the efficiency of large scale turbulent flow simulations. The high-fidelity methods for modelling turbulent flows become available for a wider range of applications thanks to the…
Transformers are central to advances in artificial intelligence (AI), excelling in fields ranging from computer vision to natural language processing. Despite their success, their large parameter count and computational demands challenge…
Traffic is essential for many dynamic processes on real networks, such as internet and urban traffic systems. The transport efficiency of the traffic system can be improved by taking full advantage of the resources in the system. In this…
This article presents an automatic approach to quickly derive a good solution for hardware resource partition and task granularity for task-based parallel applications on heterogeneous many-core architectures. Our approach employs a…
Recent advances in random-walk particle-tracking have enabled direct simulation of mixing and reactions on particles by allowing the particles to interact with each other using a multi-point mass transfer scheme. The mass transfer scheme…
Cooperative maneuver planning promises to significantly improve traffic efficiency at unsignalized intersections by leveraging connected automated vehicles. Previous works on this topic have been mostly developed for completely automated…
Parallel multiphysics simulations often suffer from load imbalances originating from the applied coupling of algorithms with spatially and temporally varying workloads. It is thus desirable to minimize these imbalances to reduce the time to…
Quite a few algorithms have been proposed to optimize the transmission performance of Multipath TCP (MPTCP). However, existing MPTCP protocols are still far from satisfactory in lossy and ever-changing networks because of their loss-based…
Transformers have revolutionized AI in natural language processing and computer vision, but their large computation and memory demands pose major challenges for hardware acceleration. In practice, end-to-end throughput is often limited by…
As the complexity of the scan algorithm is dependent on the number of design registers, large SoC scan designs can no longer be verified in RTL simulation unless partitioned into smaller sub-blocks. This paper proposes a methodology to…
Witnessing the advancing scale and complexity of chip design and benefiting from high-performance computation technologies, the simulation of Very Large Scale Integration (VLSI) Circuits imposes an increasing requirement for acceleration…
This paper studies parallelization schemes for stochastic Vector Quantization algorithms in order to obtain time speed-ups using distributed resources. We show that the most intuitive parallelization scheme does not lead to better…
Networks-on-Chip (NoCs) used in commercial many-core processors typically incorporate priority arbitration. Moreover, they experience bursty traffic due to application workloads. However, most state-of-the-art NoC analytical performance…
This paper investigates co-scheduling algorithms for processing a set of parallel applications. Instead of executing each application one by one, using a maximum degree of parallelism for each of them, we aim at scheduling several…