Related papers: Spin: An Efficient Secure Computation Framework wi…

MPC-Pipe: an Efficient Pipeline Scheme for Secure Multi-party Machine Learning Inference

Multi-party computing (MPC) has been gaining popularity as a secure computing model over the past few years. However, prior works have demonstrated that MPC protocols still pay substantial performance penalties compared to plaintext,…

Cryptography and Security · Computer Science 2024-08-28 Yongqin Wang , Rachit Rajat , Murali Annavaram

GPU-accelerated machine learning inference as a service for computing in neutrino experiments

Machine learning algorithms are becoming increasingly prevalent and performant in the reconstruction of events in accelerator-based neutrino experiments. These sophisticated algorithms can be computationally expensive. At the same time, the…

Computational Physics · Physics 2021-03-25 Michael Wang , Tingjun Yang , Maria Acosta Flechas , Philip Harris , Benjamin Hawks , Burt Holzman , Kyle Knoepfel , Jeffrey Krupa , Kevin Pedro , Nhan Tran

sPIN: High-performance streaming Processing in the Network

Optimizing communication performance is imperative for large-scale computing because communication overheads limit the strong scalability of parallel applications. Today's network cards contain rather powerful processors optimized for data…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-10-20 Torsten Hoefler , Salvatore Di Girolamo , Konstantin Taranov , Ryan E. Grant , Ron Brightwell

Perun: Secure Multi-Stakeholder Machine Learning Framework with GPU Support

Confidential multi-stakeholder machine learning (ML) allows multiple parties to perform collaborative data analytics while not revealing their intellectual property, such as ML source code, model, or datasets. State-of-the-art solutions…

Machine Learning · Computer Science 2021-06-04 Wojciech Ozga , Do Le Quoc , Christof Fetzer

Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

Multi-GPU Accelerated Multi-Spin Monte Carlo Simulations of the 2D Ising Model

A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp.…

Computational Physics · Physics 2010-07-22 Benjamin Block , Peter Virnau , Tobias Preis

SGM-PINN: Sampling Graphical Models for Faster Training of Physics-Informed Neural Networks

SGM-PINN is a graph-based importance sampling framework to improve the training efficacy of Physics-Informed Neural Networks (PINNs) on parameterized problems. By applying a graph decomposition scheme to an undirected Probabilistic…

Machine Learning · Computer Science 2024-07-11 John Anticev , Ali Aghdaei , Wuxinlin Cheng , Zhuo Feng

FINN: A Framework for Fast, Scalable Binarized Neural Network Inference

Research has shown that convolutional neural networks contain significant redundancy, and high classification accuracy can be obtained even when weights and activations are reduced from floating point to binary values. In this paper, we…

Computer Vision and Pattern Recognition · Computer Science 2016-12-22 Yaman Umuroglu , Nicholas J. Fraser , Giulio Gambardella , Michaela Blott , Philip Leong , Magnus Jahre , Kees Vissers

MG-GCN: Scalable Multi-GPU GCN Training Framework

Full batch training of Graph Convolutional Network (GCN) models is not feasible on a single GPU for large graphs containing tens of millions of vertices or more. Recent work has shown that, for the graphs used in the machine learning…

Machine Learning · Computer Science 2021-10-19 Muhammed Fatih Balın , Kaan Sancak , Ümit V. Çatalyürek

DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale

The past several years have witnessed the success of transformer-based models, and their scale and application scenarios continue to grow aggressively. The current landscape of transformer models is increasingly diverse: the model size…

Machine Learning · Computer Science 2022-07-04 Reza Yazdani Aminabadi , Samyam Rajbhandari , Minjia Zhang , Ammar Ahmad Awan , Cheng Li , Du Li , Elton Zheng , Jeff Rasley , Shaden Smith , Olatunji Ruwase , Yuxiong He

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing…

Machine Learning · Computer Science 2020-08-25 Stefanos Laskaridis , Stylianos I. Venieris , Mario Almeida , Ilias Leontiadis , Nicholas D. Lane

Simulating spin models on GPU

Over the last couple of years it has been realized that the vast computational power of graphics processing units (GPUs) could be harvested for purposes other than the video game industry. This power, which at least nominally exceeds that…

Statistical Mechanics · Physics 2011-07-26 Martin Weigel

CBNN: 3-Party Secure Framework for Customized Binary Neural Networks Inference

Binarized Neural Networks (BNN) offer efficient implementations for machine learning tasks and facilitate Privacy-Preserving Machine Learning (PPML) by simplifying operations with binary values. Nevertheless, challenges persist in terms of…

Machine Learning · Computer Science 2024-12-24 Benchang Dong , Zhili Chen , Xin Chen , Shiwen Wei , Jie Fu , Huifa Li

GRIP: A Graph Neural Network Accelerator Architecture

We present GRIP, a graph neural network accelerator architecture designed for low-latency inference. AcceleratingGNNs is challenging because they combine two distinct types of computation: arithmetic-intensive vertex-centric operations and…

Hardware Architecture · Computer Science 2020-07-31 Kevin Kiningham , Christopher Re , Philip Levis

Safe Large-Scale Robust Nonlinear MPC in Milliseconds via Reachability-Constrained System Level Synthesis on the GPU

We present GPU-SLS, a GPU-parallelized framework for safe, robust nonlinear model predictive control (MPC) that scales to high-dimensional uncertain robotic systems and long planning horizons. Our method jointly optimizes an…

Robotics · Computer Science 2026-04-10 Jeffrey Fang , Glen Chou

FRAP: A Flexible Resource Accessing Protocol for Multiprocessor Real-Time Systems

Fully-partitioned fixed-priority scheduling (FP-FPS) multiprocessor systems are widely found in real-time applications, where spin-based protocols are often deployed to manage the mutually exclusive access of shared resources.…

Operating Systems · Computer Science 2024-08-28 Shuai Zhao , Hanzhi Xu , Nan Chen , Ruoxian Su , Wanli Chang

Scalable Physics-Informed Neural Networks for Accelerating Electromagnetic Transient Stability Assessment

This paper puts forward a framework to accelerate Electromagnetic Transient (EMT) simulations by replacing individual components with trained Physics-Informed Neural Networks (PINNs). EMT simulations are considered the cornerstone of…

Systems and Control · Electrical Eng. & Systems 2026-02-20 Ignasi Ventura Nadal , Mohammad Kazem Bakhshizadeh , Petros Aristidou , Nicolae Darii , Rahul Nellikkath , Spyros Chatzivasileiadis

SPIN: Accelerating Large Language Model Inference with Heterogeneous Speculative Models

Speculative decoding has been shown as an effective way to accelerate Large Language Model (LLM) inference by using a Small Speculative Model (SSM) to generate candidate tokens in a so-called speculation phase, which are subsequently…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-21 Fahao Chen , Peng Li , Tom H. Luan , Zhou Su , Jing Deng

TINA: Acceleration of Non-NN Signal Processing Algorithms Using NN Accelerators

This paper introduces TINA, a novel framework for implementing non Neural Network (NN) signal processing algorithms on NN accelerators such as GPUs, TPUs or FPGAs. The key to this approach is the concept of mapping mathematical and logic…

Performance · Computer Science 2024-08-30 Christiaan Boerkamp , Steven van der Vlugt , Zaid Al-Ars

Speeding up Deep Learning with Transient Servers

Distributed training frameworks, like TensorFlow, have been proposed as a means to reduce the training time of deep learning models by using a cluster of GPU servers. While such speedups are often desirable---e.g., for rapidly evaluating…

Performance · Computer Science 2019-05-07 Shijian Li , Robert J. Walls , Lijie Xu , Tian Guo