Related papers: Vector-Processing for Mobile Devices: Benchmark an…

Swan: A Neural Engine for Efficient DNN Training on Smartphone SoCs

The need to train DNN models on end-user devices (e.g., smartphones) is increasing with the need to improve data privacy and reduce communication overheads. Unlike datacenter servers with powerful CPUs and GPUs, modern smartphones consist…

Machine Learning · Computer Science 2022-06-13 Sanjay Sri Vallabh Singapuram , Fan Lai , Chuheng Hu , Mosharaf Chowdhury

Performance Analysis of Traditional and Data-Parallel Primitive Implementations of Visualization and Analysis Kernels

Measurements of absolute runtime are useful as a summary of performance when studying parallel visualization and analysis methods on computational platforms of increasing concurrency and complexity. We can obtain even more insights by…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-10-07 E. Wes Bethel , David Camp , Talita Perciano , Colleen Heinemann

Towards High-Performance and Portable Molecular Docking on CPUs through Vectorization

Recent trends in the HPC field have introduced new CPU architectures with improved vectorization capabilities that require optimization to achieve peak performance and thus pose challenges for performance portability. The deployment of…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-17 Gianmarco Accordi , Jens Domke , Theresa Pollinger , Davide Gadioli , Gianluca Palermo

AraOS: Analyzing the Impact of Virtual Memory Management on Vector Unit Performance

Vector processor architectures offer an efficient solution for accelerating data-parallel workloads (e.g., ML, AI), reducing instruction count, and enhancing processing efficiency. This is evidenced by the increasing adoption of vector…

Hardware Architecture · Computer Science 2025-04-15 Matteo Perotti , Vincenzo Maisto , Moritz Imfeld , Nils Wistoff , Alessandro Cilardo , Luca Benini

Vectorization of Verilog Designs and its Effects on Verification and Synthesis

Vectorization is a compiler optimization that replaces multiple operations on scalar values with a single operation on vector values. Although common in traditional compilers such as rustc, clang, and gcc, vectorization is not common in the…

Programming Languages · Computer Science 2026-05-15 Maria Fernanda Oliveira Guimarães , Ulisses Rosa , Ian Trudel , João Victor Amorim Vieira , Augusto Amaral Mafra , Mirlaine Crepalde , Fernando Magno Quintão Pereira

The GAP Benchmark Suite

We present a graph processing benchmark suite with the goal of helping to standardize graph processing evaluations. Fewer differences between graph processing evaluations will make it easier to compare different research efforts and…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-18 Scott Beamer , Krste Asanović , David Patterson

On Vectorization of Deep Convolutional Neural Networks for Vision Tasks

We recently have witnessed many ground-breaking results in machine learning and computer vision, generated by using deep convolutional neural networks (CNN). While the success mainly stems from the large volume of training data and the deep…

Computer Vision and Pattern Recognition · Computer Science 2015-01-30 Jimmy SJ. Ren , Li Xu

Addressing memory bandwidth scalability in vector processors for streaming applications

As the size of artificial intelligence and machine learning (AI/ML) models and datasets grows, the memory bandwidth becomes a critical bottleneck. The paper presents a novel extended memory hierarchy that addresses some major memory…

Hardware Architecture · Computer Science 2025-05-20 Jordi Altayo , Paul Delestrac , David Novo , Simey Yang , Debjyoti Bhattacharjee , Francky Catthoor

Memory-constrained Vectorization and Scheduling of Dataflow Graphs for Hybrid CPU-GPU Platforms

The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task and data-level parallelism to meet throughput requirements…

Signal Processing · Electrical Eng. & Systems 2017-12-01 Shuoxin Lin , Jiahao Wu , Shuvra S. Bhattacharyya

Web Performance with Android's Battery-Saver Mode

A Web browser utilizes a device's CPU to parse HTML, build a Document Object Model, a Cascading Style Sheets Object Model, and render trees, and parse, compile, and execute computationally-heavy JavaScript. A powerful CPU is required to…

Performance · Computer Science 2020-03-17 Utkarsh Goel , Stephen Ludin , Moritz Steiner

PDSP-Bench: A Benchmarking System for Parallel and Distributed Stream Processing

The paper introduces PDSP-Bench, a novel benchmarking system designed for a systematic understanding of performance of parallel stream processing in a distributed environment. Such an understanding is essential for determining how Stream…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-16 Pratyush Agnihotri , Boris Koldehofe , Roman Heinrich , Carsten Binnig , Manisha Luthra

Parallel Performance-Energy Predictive Modeling of Browsers: Case Study of Servo

Mozilla Research is developing Servo, a parallel web browser engine, to exploit the benefits of parallelism and concurrency in the web rendering pipeline. Parallelization results in improved performance for pinterest.com but not for…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-11 Rohit Zambre , Lars Bergstrom , Laleh Aghababaie Beni , Aparna Chandramowliswharan

To GPU or Not to GPU: Vector Search in Relational Engines

Vector search (VS) is now available in most database engines. However, while vector search is a common feature in AI/ML/LLMs where the dominant computing platforms are GPUs, existing database engines operate on CPUs even when implementing…

Databases · Computer Science 2026-05-18 Vasilis Mageirakos , Joel André , Marko Kabić , Bowen Wu , Yannis Chronis , Gustavo Alonso

cphVB: A System for Automated Runtime Optimization and Parallelization of Vectorized Applications

Modern processor architectures, in addition to having still more cores, also require still more consideration to memory-layout in order to run at full capacity. The usefulness of most languages is deprecating as their abstractions,…

Programming Languages · Computer Science 2013-03-26 Mads Ruben Burgdorff Kristensen , Simon Andreas Frimann Lund , Troels Blum , Brian Vinter

A Comparison of Support Vector Machines Training GPU-Accelerated Open Source Implementations

Last several years, GPUs are used to accelerate computations in many computer science domains. We focused on GPU accelerated Support Vector Machines (SVM) training with non-linear kernel functions. We had searched for all available GPU…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-07-21 Jan Vanek , Josef Michalek , Josef Psutka

Sampling Streaming Data with Parallel Vector Quantization -- PVQ

Accumulation of corporate data in the cloud has attracted more enterprise applications to the cloud creating data gravity. As a consequence, network traffic has become more cloud centric. This increase in cloud centric traffic poses new…

Machine Learning · Computer Science 2022-10-05 Mujahid Sultan

Benchmark Framework with Skewed Workloads

In this work, we present a new benchmarking suite with new real-life inspired skewed workloads to test the performance of concurrent index data structures. We started this project to prepare workloads specifically for self-adjusting data…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-19 Vitaly Aksenov , Dmitry Ivanov , Ravil Galiev

SoK: The Faults in our Graph Benchmarks

Graph-structured data is prevalent in domains such as social networks, financial transactions, brain networks, and protein interactions. As a result, the research community has produced new databases and analytics engines to process such…

Databases · Computer Science 2024-04-02 Puneet Mehrotra , Vaastav Anand , Daniel Margo , Milad Rezaei Hajidehi , Margo Seltzer

Benchmarking Machine Learning: How Fast Can Your Algorithms Go?

This paper is focused on evaluating the effect of some different techniques in machine learning speed-up, including vector caches, parallel execution, and so on. The following content will include some review of the previous approaches and…

Machine Learning · Computer Science 2021-01-12 Zeyu Ning , Hugues Nelson Iradukunda , Qingquan Zhang , Ting Zhu

Influence of Parallelism in Vector-Multiplication Units on Correlation Power Analysis

The use of neural networks in edge devices is increasing, which introduces new security challenges related to the neural networks' confidentiality. As edge devices often offer physical access, attacks targeting the hardware, such as…

Cryptography and Security · Computer Science 2026-02-06 Manuel Brosch , Matthias Probst , Stefan Kögler , Georg Sigl