Related papers: Vector-Processing for Mobile Devices: Benchmark an…
The need to train DNN models on end-user devices (e.g., smartphones) is increasing with the need to improve data privacy and reduce communication overheads. Unlike datacenter servers with powerful CPUs and GPUs, modern smartphones consist…
Measurements of absolute runtime are useful as a summary of performance when studying parallel visualization and analysis methods on computational platforms of increasing concurrency and complexity. We can obtain even more insights by…
Recent trends in the HPC field have introduced new CPU architectures with improved vectorization capabilities that require optimization to achieve peak performance and thus pose challenges for performance portability. The deployment of…
Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency. This is evidenced by the increasing adoption of vector…
Vectorization is a compiler optimization that replaces multiple operations on scalar values with a single operation on vector values. Although common in traditional compilers such as rustc, clang, and gcc, vectorization is not common in the…
We present a graph processing benchmark suite with the goal of helping to standardize graph processing evaluations. Fewer differences between graph processing evaluations will make it easier to compare different research efforts and…
We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep…
As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…
The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements…
A Web browser utilizes a device's CPU to parse HTML, build a Document Object Model, a Cascading Style Sheets Object Model, and render trees, and parse, compile, and execute computationally-heavy JavaScript. A powerful CPU is required to…
The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…
Mozilla Research is developing Servo, a parallel web browser engine, to exploit the benefits of parallelism and concurrency in the web rendering pipeline. Parallelization results in improved performance for pinterest.com but not for…
Vector search (VS) is now available in most database engines. However, while vector search is a common feature in AI/ML/LLMs where the dominant computing platforms are GPUs, existing database engines operate on CPUs even when implementing…
Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions,…
Last several years, GPUs are used to accelerate computations in many computer science domains. We focused on GPU accelerated Support Vector Machines (SVM) training with non-linear kernel functions. We had searched for all available GPU…
Accumulation of corporate data in the cloud has attracted more enterprise applications to the cloud creating data gravity. As a consequence, network traffic has become more cloud centric. This increase in cloud centric traffic poses new…
In this work, we present a new benchmarking suite with new real-life inspired skewed workloads to test the performance of concurrent index data structures. We started this project to prepare workloads specifically for self-adjusting data…
Graph-structured data is prevalent in domains such as social networks, financial transactions, brain networks, and protein interactions. As a result, the research community has produced new databases and analytics engines to process such…
This paper is focused on evaluating the effect of some different techniques in machine learning speed-up, including vector caches, parallel execution, and so on. The following content will include some review of the previous approaches and…
The use of neural networks in edge devices is increasing, which introduces new security challenges related to the neural networks' confidentiality. As edge devices often offer physical access, attacks targeting the hardware, such as…