Ruobing Han — Scifaro

CuFuzz: Hardening CUDA Programs through Transformation and Fuzzing

GPUs have gained significant popularity over the past decade, extending beyond their original role in graphics rendering. This evolution has brought GPU security and reliability to the forefront of concerns. Prior research has shown that…

Cryptography and Security · Computer Science 2026-01-06 Saurabh Singh , Ruobing Han , Jaewon Lee , Seonjin Na , Yonghae Kim , Taesoo Kim , Hyesoon Kim

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

A world model enables an intelligent agent to imagine, predict, and reason about how the world evolves in response to its actions, and accordingly to plan and strategize. While recent video generation models produce realistic visual…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 PAN Team , Jiannan Xiang , Yi Gu , Zihan Liu , Zeyu Feng , Qiyue Gao , Yiyan Hu , Benhao Huang , Guangyi Liu , Yichi Yang , Kun Zhou , Davit Abrahamyan , Arif Ahmad , Ganesh Bannur , Junrong Chen , Kimi Chen , Mingkai Deng , Ruobing Han , Xinqi Huang , Haoqiang Kang , Zheqi Liu , Enze Ma , Hector Ren , Yashowardhan Shinde , Rohan Shingre , Ramsundar Tanikella , Kaiming Tao , Dequan Yang , Xinle Yu , Cong Zeng , Binglin Zhou , Zhengzhong Liu , Zhiting Hu , Eric P. Xing

CuPBoP: CUDA for Parallelized and Broad-range Processors

CUDA is one of the most popular choices for GPU programming, but it can only be executed on NVIDIA GPUs. Executing CUDA on non-NVIDIA devices not only benefits the hardware community, but also allows data-parallel computation in…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-06-17 Ruobing Han , Jun Chen , Bhanu Garg , Jeffrey Young , Jaewoong Sim , Hyesoon Kim

COX: CUDA on X86 by Exposing Warp-Level Functions to CPUs

As CUDA programs become the de facto program among data parallel applications such as high-performance computing or machine learning applications, running CUDA on other platforms has been a compelling option. Although several efforts have…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-12-21 Ruobing Han , Jaewon Lee , Jaewoong Sim , Hyesoon Kim

Supporting CUDA for an extended RISC-V GPU architecture

With the rapid development of scientific computation, more and more researchers and developers are committed to implementing various workloads/operations on different devices. Among all these devices, NVIDIA GPU is the most popular choice…

Programming Languages · Computer Science 2021-09-03 Ruobing Han , Blaise Tine , Jaewon Lee , Jaewoong Sim , Hyesoon Kim

Auto-Precision Scaling for Distributed Deep Learning

It has been reported that the communication cost for synchronizing gradients can be a bottleneck, which limits the scalability of distributed deep learning. Using low-precision gradients is a promising technique for reducing the bandwidth…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-18 Ruobing Han , James Demmel , Yang You

Optimizing Network Performance for Distributed DNN Training on GPU Clusters: ImageNet/AlexNet Training in 1.5 Minutes

It is important to scale out deep neural network (DNN) training for reducing model training time. The high communication overhead is one of the major performance bottlenecks for distributed DNN training across multiple GPUs. Our…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-10-23 Peng Sun , Wansen Feng , Ruobing Han , Shengen Yan , Yonggang Wen