English
Related papers

Related papers: SOL: Effortless Device Support for AI Frameworks w…

200 papers

The increased interest in Artificial Intelligence (AI) raised the need for highly optimized and sophisticated AI frameworks. Starting with the Lua-based Torch many frameworks have emerged over time, such as Theano, Caffe, Chainer, CNTK,…

Machine Learning · Computer Science 2022-05-24 Nicolas Weber

Current trends point to a future where large-scale scientific applications are tightly-coupled HPC/AI hybrids. Hence, we urgently need to invest in creating a seamless, scalable framework where HPC and AI/ML can efficiently work together…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-06 Jens Domke , Mohamed Wahib , Anshu Dubey , Tal Ben-Nun , Erik W. Draeger

Mobile robots are increasingly deployed for inspection, patrol, and search-and-rescue operations, relying on computer vision for perception, navigation, and autonomous decision-making. However, executing modern vision workloads onboard is…

Modern data centers have grown beyond CPU nodes to provide domain-specific accelerators such as GPUs and FPGAs to their customers. From a security standpoint, cloud customers want to protect their data. They are willing to pay additional…

Cryptography and Security · Computer Science 2022-11-02 Aritra Dhar , Supraja Sridhara , Shweta Shinde , Srdjan Capkun , Renzo Andri

Local execution of AI on edge devices is important for low latency and offline operation. However, deploying models on diverse hardware remains fragmented, often requiring model conversion or complete reimplementation outside the PyTorch…

State-of-the-art deep learning systems such as TensorFlow and PyTorch tightly couple the model with the underlying hardware. This coupling requires the user to modify application logic in order to run the same job across a different set of…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-05-13 Andrew Or , Haoyu Zhang , Michael J. Freedman

We introduce SparkCL, an open source unified programming framework based on Java, OpenCL and the Apache Spark framework. The motivation behind this work is to bring unconventional compute cores such as FPGAs/GPUs/APUs/DSPs and future core…

Distributed, Parallel, and Cluster Computing · Computer Science 2015-05-06 Oren Segal , Philip Colangelo , Nasibeh Nasiri , Zhuo Qian , Martin Margala

The synthesis of high-performance computing (particularly graphics processing units), cloud computing services (like Google Colab), and high-level deep learning frameworks (such as PyTorch) has powered the burgeoning field of artificial…

Computational Physics · Physics 2020-03-23 Vaibhav Vavilala

Since 2013, the PULP (Parallel Ultra-Low Power) Platform project has been one of the most active and successful initiatives in designing research IPs and releasing them as open-source. Its portfolio now ranges from processor cores to…

Hardware Architecture · Computer Science 2024-12-31 Francesco Conti , Angelo Garofalo , Davide Rossi , Giuseppe Tagliavini , Luca Benini

With the rise of Deep Learning (DL), our world braces for AI in every edge device, creating an urgent need for edge-AI SoCs. This SoC hardware needs to support high throughput, reliable and secure AI processing at Ultra Low Power (ULP),…

Neural network frameworks such as PyTorch and TensorFlow are the workhorses of numerous machine learning applications ranging from object recognition to machine translation. While these frameworks are versatile and straightforward to use,…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-04-24 Nicolas Weber , Florian Schmidt , Mathias Niepert , Felipe Huici

The increasing interest in the usage of Artificial Intelligence techniques (AI) from the research community and industry to tackle "real world" problems, requires High Performance Computing (HPC) resources to efficiently compute and scale…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-14 David Brayford , Sofia Vallecorsa , Atanas Atanasov , Fabio Baruffa , Walter Riviera

The emergence of large-scale, sparse, multimodal, and agentic AI models has coincided with a shift in hardware toward supernode architectures that integrate hundreds to thousands of accelerators with ultra-low-latency interconnects and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-05 Xin Zhang , Beilei Sun , Teng Su , Qinghua Zhang , Chong Bao , Lei Chen , Xuefeng Jin

The B5G/6G evolution relies on connect-compute technologies and highly heterogeneous clusters with HW accelerators, which require specialized coding to be efficiently utilized. The current paper proposes a custom tool for generating…

Serverless computing offers elastic scaling and pay-per-use execution, making it well-suited for AI workloads. As these workloads run in heterogeneous environments such as the Edge-Cloud-Space 3D Continuum, they often require intensive…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-19 Maximilian Reisecker , Cynthia Marcelino , Thomas Pusztai , Stefan Nastic

The amazing advances being made in the fields of machine and deep learning are a highlight of the Big Data era for both enterprise and research communities. Modern applications require resources beyond a single node's ability to provide.…

Compound AI systems, orchestrating multiple AI components and external APIs, are increasingly vital but face challenges in managing complexity, handling ambiguity, and enabling effective development workflows. Existing frameworks often…

Artificial Intelligence · Computer Science 2025-04-08 Helena Zhang , Jakobi Haskell , Yosef Frost

Today, artificial neural networks are one of the major innovators pushing the progress of machine learning. This has particularly affected the development of neural network accelerating hardware. However, since most of these architectures…

Hardware Architecture · Computer Science 2021-02-12 Simon Pfenning , Philipp Holzinger , Marc Reichenbach

Create an idea, prototype it, evaluate if users like it, then learn. It is the circle of business. If AI can operate in all parts of the circle, it will enable rapid iteration and learning speeds for businesses. Experiment platforms that…

Software Engineering · Computer Science 2026-04-28 Jeffrey Wong , Antoine Creux

The recent advancements in multicore machines highlight the need to simplify concurrent programming in order to leverage their computational power. One way to achieve this is by designing efficient concurrent data structures (e.g. stacks,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-31 Nikolaos D. Kallimanis
‹ Prev 1 2 3 10 Next ›