Related papers: Nancy: An efficient parallel Network Calculus libr…
Network Calculus (NC) is an algebraic theory that represents traffic and service guarantees as curves in a Cartesian plane, in order to compute performance guarantees for flows traversing a network. NC uses transformation operations, e.g.,…
Network Calculus (NC) is a versatile analytical methodology to efficiently compute performance bounds in networked systems. The arrival and service curve abstractions allow to model diverse and heterogeneous distributed systems. The…
Software network functions (NFs) trade-off flexibility and ease of deployment for an increased challenge of performance. The traditional way to increase NF performance is by distributing traffic to multiple CPU cores, but this poses a…
ABCpy is a highly modular scientific library for Approximate Bayesian Computation (ABC) written in Python. The main contribution of this paper is to document a software engineering effort that enables domain scientists to easily apply ABC…
We present a novel class of methods to compute functions of matrices or their action on vectors that are suitable for parallel programming. Solving appropriate simple linear systems of equations in parallel (or computing the inverse of…
Neural networks (NNs) with intensive multiplications (e.g., convolutions and transformers) are capable yet power hungry, impeding their more extensive deployment into resource-constrained devices. As such, multiplication-free networks,…
The inherent diversity of computation types within the deep neural network (DNN) models often requires a variety of specialized units in hardware processors, which limits computational efficiency, increasing both inference latency and power…
A new scalable parallel math library, dMath, is presented in this paper that demonstrates leading scaling when using intranode, or internode, hybrid-parallelism for deep-learning. dMath provides easy-to-use distributed base primitives and a…
Network calculus (NC), particularly its min-plus branch, has been extensively utilized to construct service models and compute delay bounds for time-sensitive networks (TSNs). This paper provides a revisit to the fundamental results. In…
We investigate the energy efficiency of a library designed for parallel computations with sparse matrices. The library leverages high-performance, energy-efficient Graphics Processing Unit (GPU) accelerators to enable large-scale scientific…
Bayesian networks (BNs) are attractive, because they are graphical and interpretable machine learning models. However, exact inference on BNs is time-consuming, especially for complex problems. To improve the efficiency, we propose a fast…
We propose Chunks and Tasks, a parallel programming model built on abstractions for both data and work. The application programmer specifies how data and work can be split into smaller pieces, chunks and tasks, respectively. The Chunks and…
Nowadays, Cellular Neural Networks (CNN) are practically implemented in parallel, analog computers, showing a fast developing trend. Physicist must be aware that such computers are appropriate for solving in an elegant manner practically…
Networks with hop-by-hop flow control occur in several contexts, from data centers to systems architectures (e.g., wormhole-routing networks on chip). A worst-case end-to-end delay in such networks can be computed using Network Calculus…
Deep Neural Networks (DNNs) have emerged as a core tool for machine learning. The computations performed during DNN training and inference are dominated by operations on the weight matrices describing the DNN. As DNNs incorporate more…
As it is getting increasingly difficult to achieve gains in the density and power efficiency of microelectronic computing devices because of lithographic techniques reaching fundamental physical limits, new approaches are required to…
Hardware double precision is often insufficient to solve large scientific problems accurately. Computing in higher precision defined by software causes significant computational overhead. The application of parallel algorithms compensates…
Neural network based approximate computing is a universal architecture promising to gain tremendous energy-efficiency for many error resilient applications. To guarantee the approximation quality, existing works deploy two neural networks…
Binarized neural networks, or BNNs, show great promise in edge-side applications with resource limited hardware, but raise the concerns of reduced accuracy. Motivated by the complex neural networks, in this paper we introduce complex…
The choice of neural network features can have a large impact on both the accuracy and speed of the network. Despite the current industry shift towards large transformer models, specialized binary classifiers remain critical for numerous…