Sheng Lin — Scifaro

S2O: Early Stopping for Sparse Attention via Online Permutation

Attention scales quadratically with sequence length, fundamentally limiting long-context inference. Existing block-granularity sparsification can reduce latency, but coarse blocks impose an intrinsic sparsity ceiling, making further…

Machine Learning · Computer Science 2026-05-06 Yu Zhang , Songwei Liu , Chenqian Yan , Sheng Lin , Beichen Ning , Fangmin Chen , Xing Wang

Unleashing Efficient Asynchronous RL Post-Training via Staleness-Constrained Rollout Coordination

Reinforcement learning (RL) post-training has become pivotal for enhancing the capabilities of modern large models. A recent trend is to develop RL systems with a fully disaggregated architecture, which decouples the three RL phases…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-21 Haoyang Li , Sheng Lin , Fangcheng Fu , Yuming Zhou , Xiaodong Ji , Yanfeng Zhao , Lefeng Wang , Jie Jiang , Bin Cui

The Interaction of Moving $\mathbf{Q\bar{Q}}$ and QQq in the Thermal Plasma

The strength of the interaction between heavy quarks is studied for heavy quarkonium ($\mathrm{Q\bar{Q}}$) and doubly heavy baryons ($\mathrm{QQq}$) at finite temperature and rapidity using the gauge/gravity duality in this paper. We show…

High Energy Physics - Phenomenology · Physics 2026-01-07 Xuan Liu , Sheng Lin , Xun Chen

Hydraulis: Balancing Large Transformer Model Training via Co-designing Parallel Strategies and Data Assignment

To optimize large Transformer model training, both efficient parallel computing and advanced data management are indispensable. However, current methods often assume a stable and uniform training workload, neglecting data-induced…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-16 Haoyang Li , Fangcheng Fu , Sheng Lin , Hao Ge , Xuanyu Wang , Jiawen Niu , Jinbao Xue , Yangyu Tao , Di Wang , Jie Jiang , Bin Cui

LobRA: Multi-tenant Fine-tuning over Heterogeneous Data

With the breakthrough of Transformer-based pre-trained models, the demand for fine-tuning (FT) to adapt the base pre-trained models to downstream applications continues to grow, so it is essential for service providers to reduce the cost of…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-03 Sheng Lin , Fangcheng Fu , Haoyang Li , Hao Ge , Xuanyu Wang , Jiawen Niu , Yaofeng Tu , Bin Cui

Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization

As the scale of models and training data continues to grow, there is an expanding reliance on more GPUs to train large-scale models, which inevitably increases the likelihood of encountering dynamic stragglers that some devices lag behind…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-07 Haoyang Li , Fangcheng Fu , Hao Ge , Sheng Lin , Xuanyu Wang , Jiawen Niu , Yujie Wang , Hailin Zhang , Xiaonan Nie , Bin Cui

Hetu v2: A General and Scalable Deep Learning System with Hierarchical and Heterogeneous Single Program Multiple Data Annotations

The Single Program Multiple Data (SPMD) paradigm provides a unified abstraction to annotate various parallel dimensions in distributed deep learning (DL) training. With SPMD, users can write training programs from the viewpoint of a single…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-30 Haoyang Li , Fangcheng Fu , Hao Ge , Sheng Lin , Xuanyu Wang , Jiawen Niu , Xupeng Miao , Bin Cui

Holographic Schwinger effect in flavor-dependent systems

The holographic Schwinger effect is investigated in systems with $N_{f}=0$, $N_{f}=2$, and $N_{f}=2+1$ using the Einstein-Maxwell-dilaton (EMD) model, incorporating equation of state and baryon number susceptibility information from lattice…

High Energy Physics - Phenomenology · Physics 2025-01-20 Sheng Lin , Xuan Liu , Xun Chen , Gen-Fa Zhang , Jing Zhou

Towards Zero Memory Footprint Spiking Neural Network Training

Biologically-inspired Spiking Neural Networks (SNNs), processing information using discrete-time events known as spikes rather than continuous values, have garnered significant attention due to their hardware-friendly and energy-efficient…

Neural and Evolutionary Computing · Computer Science 2023-08-21 Bin Lei , Sheng Lin , Pei-Hung Lin , Chunhua Liao , Caiwen Ding

FAIVConf: Face enhancement for AI-based Video Conference with Low Bit-rate

Recently, high-quality video conferencing with fewer transmission bits has become a very hot and challenging problem. We propose FAIVConf, a specially designed video compression framework for video conferencing, based on the effective…

Image and Video Processing · Electrical Eng. & Systems 2022-07-12 Zhengang Li , Sheng Lin , Shan Liu , Songnan Li , Xue Lin , Wei Wang , Wei Jiang

A Secure and Efficient Federated Learning Framework for NLP

In this work, we consider the problem of designing secure and efficient federated learning (FL) frameworks. Existing solutions either involve a trusted aggregator or require heavyweight cryptographic primitives, which degrades performance…

Cryptography and Security · Computer Science 2022-01-31 Jieren Deng , Chenghong Wang , Xianrui Meng , Yijue Wang , Ji Li , Sheng Lin , Shuo Han , Fei Miao , Sanguthevar Rajasekaran , Caiwen Ding

CAP-RAM: A Charge-Domain In-Memory Computing 6T-SRAM for Accurate and Precision-Programmable CNN Inference

A compact, accurate, and bitwidth-programmable in-memory computing (IMC) static random-access memory (SRAM) macro, named CAP-RAM, is presented for energy-efficient convolutional neural network (CNN) inference. It leverages a novel…

Hardware Architecture · Computer Science 2021-07-07 Zhiyu Chen , Zhanghao Yu , Qing Jin , Yan He , Jingyu Wang , Sheng Lin , Dai Li , Yanzhi Wang , Kaiyuan Yang

FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator

Recent works demonstrated the promise of using resistive random access memory (ReRAM) as an emerging technology to perform inherently parallel analog domain in-situ matrix-vector multiplication -- the intensive and key computation in DNNs.…

Hardware Architecture · Computer Science 2021-06-18 Geng Yuan , Payman Behnam , Zhengang Li , Ali Shafiee , Sheng Lin , Xiaolong Ma , Hang Liu , Xuehai Qian , Mahdi Nazm Bojnordi , Yanzhi Wang , Caiwen Ding

Efficient Micro-Structured Weight Unification and Pruning for Neural Network Compression

Compressing Deep Neural Network (DNN) models to alleviate the storage and computation requirements is essential for practical applications, especially for resource limited devices. Although capable of reducing a reasonable amount of model…

Machine Learning · Computer Science 2021-06-17 Sheng Lin , Wei Jiang , Wei Wang , Kaidi Xu , Yanzhi Wang , Shan Liu , Songnan Li

ESMFL: Efficient and Secure Models for Federated Learning

Nowadays, Deep Neural Networks are widely applied to various domains. However, massive data collection required for deep neural network reveals the potential privacy issues and also consumes large mounts of communication bandwidth. To…

Cryptography and Security · Computer Science 2021-03-05 Sheng Lin , Chenghong Wang , Hongjia Li , Jieren Deng , Yanzhi Wang , Caiwen Ding

An Image Enhancing Pattern-based Sparsity for Real-time Inference on Mobile Devices

Weight pruning has been widely acknowledged as a straightforward and effective method to eliminate redundancy in Deep Neural Networks (DNN), thereby achieving acceleration on various platforms. However, most of the pruning techniques are…

Computer Vision and Pattern Recognition · Computer Science 2020-07-07 Xiaolong Ma , Wei Niu , Tianyun Zhang , Sijia Liu , Sheng Lin , Hongjia Li , Xiang Chen , Jian Tang , Kaisheng Ma , Bin Ren , Yanzhi Wang

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones. However, previous RNN compression techniques either suffer from hardware performance overhead due to…

Sound · Computer Science 2020-02-28 Peiyan Dong , Siyue Wang , Wei Niu , Chengming Zhang , Sheng Lin , Zhengang Li , Yifan Gong , Bin Ren , Xue Lin , Yanzhi Wang , Dingwen Tao

PatDNN: Achieving Real-Time DNN Execution on Mobile Devices with Pattern-based Weight Pruning

With the emergence of a spectrum of high-end mobile devices, many applications that formerly required desktop-level computation capability are being transferred to these devices. However, executing the inference of Deep Neural Networks…

Machine Learning · Computer Science 2020-01-23 Wei Niu , Xiaolong Ma , Sheng Lin , Shihao Wang , Xuehai Qian , Xue Lin , Yanzhi Wang , Bin Ren

Non-Structured DNN Weight Pruning -- Is It Beneficial in Any Platform?

Large deep neural network (DNN) models pose the key challenge to energy efficiency due to the significantly higher energy consumption of off-chip DRAM accesses than arithmetic or SRAM operations. It motivates the intensive research on model…

Machine Learning · Computer Science 2020-01-09 Xiaolong Ma , Sheng Lin , Shaokai Ye , Zhezhi He , Linfeng Zhang , Geng Yuan , Sia Huat Tan , Zhengang Li , Deliang Fan , Xuehai Qian , Xue Lin , Kaisheng Ma , Yanzhi Wang

A SOT-MRAM-based Processing-In-Memory Engine for Highly Compressed DNN Implementation

The computing wall and data movement challenges of deep neural networks (DNNs) have exposed the limitations of conventional CMOS-based DNN accelerators. Furthermore, the deep structure and large model size will make DNNs prohibitive to…

Signal Processing · Electrical Eng. & Systems 2019-12-12 Geng Yuan , Xiaolong Ma , Sheng Lin , Zhengang Li , Caiwen Ding