Related papers: SOL: Effortless Device Support for AI Frameworks w…

SOL: Reducing the Maintenance Overhead for Integrating Hardware Support into AI Frameworks

The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK,…

Machine Learning · Computer Science 2022-05-24 Nicolas Weber

A Unifying Framework to Enable Artificial Intelligence in High Performance Computing Workflows

Current trends point to a future where large-scale scientific applications are tightly-coupled HPC/AI hybrids. Hence, we urgently need to invest in creating a seamless, scalable framework where HPC and AI/ML can efficiently work together…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-06 Jens Domke , Mohamed Wahib , Anshu Dubey , Tal Ben-Nun , Erik W. Draeger

vAccSOL: Efficient and Transparent AI Vision Offloading for Mobile Robots

Mobile robots are increasingly deployed for inspection, patrol, and search-and-rescue operations, relying on computer vision for perception, navigation, and autonomous decision-making. However, executing modern vision workloads onboard is…

Robotics · Computer Science 2026-03-18 Adam Zahir , Michele Gucciardom Falk Selker , Anastasios Nanos , Kostis Papazafeiropoulos , Carlos J. Bernardos , Nicolas Weber , Roberto Gonzalez

Empowering Data Centers for Next Generation Trusted Computing

Modern data centers have grown beyond CPU nodes to provide domain-specific accelerators such as GPUs and FPGAs to their customers. From a security standpoint, cloud customers want to protect their data. They are willing to pay additional…

Cryptography and Security · Computer Science 2022-11-02 Aritra Dhar , Supraja Sridhara , Shweta Shinde , Srdjan Capkun , Renzo Andri

ExecuTorch -- A Unified PyTorch Solution to Run AI Models On-Device

Local execution of AI on edge devices is important for low latency and offline operation. However, deploying models on diverse hardware remains fragmented, often requiring model conversion or complete reimplementation outside the PyTorch…

Machine Learning · Computer Science 2026-05-12 Mergen Nachin , Digant Desai , Sicheng Stephen Jia , Chen Lai , Mengwei Liu , Jacob Szwejbka , Raziel Alvarez , RJ Ascani , Dave Bort , Manuel Candales , Andrew Caples , Yanan Cao , Zhengxu Chen , Soumith Chintala , Gregory Comer , Tanvir Islam , Songhao Jia , Tarun Karuturi , Jack Khuu , Abhinay Kukkadapu , Tugsbayasgalan Manlaibaatar , Andrew Or , Kimish Patel , Siddartha Pothapragada , Lucy Qiu , Supriya Rao , Orion Reblitz-Richardson , Max Ren , Scott Roy , Anthony Shoumikhin , Scott Wolchok , Guang Yang , Angela Yi , Martin Yuan , Hansong Zhang , Jack Zhang , Jerry Zhang , Shunting Zhang , C. Cagatay Bilgin

VirtualFlow: Decoupling Deep Learning Models from the Underlying Hardware

State-of-the-art deep learning systems such as TensorFlow and PyTorch tightly couple the model with the underlying hardware. This coupling requires the user to modify application logic in order to run the same job across a different set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-13 Andrew Or , Haoyu Zhang , Michael J. Freedman

SparkCL: A Unified Programming Framework for Accelerators on Heterogeneous Clusters

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-06 Oren Segal , Philip Colangelo , Nasibeh Nasiri , Zhuo Qian , Martin Margala

Combining high-performance hardware, cloud computing, and deep learning frameworks to accelerate physical simulations: probing the Hopfield network

The synthesis of high-performance computing (particularly graphics processing units), cloud computing services (like Google Colab), and high-level deep learning frameworks (such as PyTorch) has powered the burgeoning field of artificial…

Computational Physics · Physics 2020-03-23 Vaibhav Vavilala

Open-Source Heterogeneous SoCs for AI: The PULP Platform Experience

Since 2013, the PULP (Parallel Ultra-Low Power) Platform project has been one of the most active and successful initiatives in designing research IPs and releasing them as open-source. Its portfolio now ranges from processor cores to…

Hardware Architecture · Computer Science 2024-12-31 Francesco Conti , Angelo Garofalo , Davide Rossi , Giuseppe Tagliavini , Luca Benini

CONVOLVE: Smart and seamless design of smart edge processors

With the rise of Deep Learning (DL), our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at Ultra Low Power (ULP),…

Hardware Architecture · Computer Science 2023-05-04 M. Gomony , F. Putter , A. Gebregiorgis , G. Paulin , L. Mei , V. Jain , S. Hamdioui , V. Sanchez , T. Grosser , M. Geilen , M. Verhelst , F. Zenke , F. Gurkaynak , B. Bruin , S. Stuijk , S. Davidson , S. De , M. Ghogho , A. Jimborean , S. Eissa , L. Benini , D. Soudris , R. Bishnoi , S. Ainsworth , F. Corradi , O. Karrakchou , T. Güneysu , H. Corporaal

BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism

Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-24 Nicolas Weber , Florian Schmidt , Mathias Niepert , Felipe Huici

Deploying AI Frameworks on Secure HPC Systems with Containers

The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-14 David Brayford , Sofia Vallecorsa , Atanas Atanasov , Fabio Baruffa , Walter Riviera

HyperParallel: A Supernode-Affinity AI Framework

The emergence of large-scale, sparse, multimodal, and agentic AI models has coincided with a shift in hardware toward supernode architectures that integrate hundreds to thousands of accelerators with ultra-low-latency interconnects and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-05 Xin Zhang , Beilei Sun , Teng Su , Qinghua Zhang , Chong Bao , Lei Chen , Xuefeng Jin

TF2AIF: Facilitating development and deployment of accelerated AI models on the cloud-edge continuum

The B5G/6G evolution relies on connect-compute technologies and highly heterogeneous clusters with HW accelerators, which require specialized coding to be efficiently utilized. The current paper proposes a custom tool for generating…

Machine Learning · Computer Science 2024-07-24 Aimilios Leftheriotis , Achilleas Tzenetopoulos , George Lentaris , Dimitrios Soudris , Georgios Theodoridis

Gaia: Hybrid Hardware Acceleration for Serverless AI in the 3D Compute Continuum

Serverless computing offers elastic scaling and pay-per-use execution, making it well-suited for AI workloads. As these workloads run in heterogeneous environments such as the Edge-Cloud-Space 3D Continuum, they often require intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-19 Maximilian Reisecker , Cynthia Marcelino , Thomas Pusztai , Stefan Nastic

High Performance Data Engineering Everywhere

The amazing advances being made in the fields of machine and deep learning are a highlight of the Big Data era for both enterprise and research communities. Modern applications require resources beyond a single node's ability to provide.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-07-21 Chathura Widanage , Niranda Perera , Vibhatha Abeykoon , Supun Kamburugamuve , Thejaka Amila Kanewala , Hasara Maithree , Pulasthi Wickramasinghe , Ahmet Uyar , Gurhan Gunduz , Geoffrey Fox

Flow State: Humans Enabling AI Systems to Program Themselves

Compound AI systems, orchestrating multiple AI components and external APIs, are increasingly vital but face challenges in managing complexity, handling ambiguity, and enabling effective development workflows. Existing frameworks often…

Artificial Intelligence · Computer Science 2025-04-08 Helena Zhang , Jakobi Haskell , Yosef Frost

Transparent FPGA Acceleration with TensorFlow

Today, artificial neural networks are one of the major innovators pushing the progress of machine learning. This has particularly affected the development of neural network accelerating hardware. However, since most of these architectures…

Hardware Architecture · Computer Science 2021-02-12 Simon Pfenning , Philipp Holzinger , Marc Reichenbach

Closing the Loop: A Software Framework for AI to Support Business Decision Making

Create an idea, prototype it, evaluate if users like it, then learn. It is the circle of business. If AI can operate in all parts of the circle, it will enable rapid iteration and learning speeds for businesses. Experiment platforms that…

Software Engineering · Computer Science 2026-04-28 Jeffrey Wong , Antoine Creux

Synch: A framework for concurrent data-structures and benchmarks

The recent advancements in multicore machines highlight the need to simplify concurrent programming in order to leverage their computational power. One way to achieve this is by designing efficient concurrent data structures (e.g. stacks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-31 Nikolaos D. Kallimanis