Related papers: Automatic Performance Debugging of SPMD-style Para…

Automatic Performance Debugging of SPMD Parallel Programs

Automatic performance debugging of parallel applications usually involves two steps: automatic detection of performance bottlenecks and uncovering their root causes for performance optimization. Previous work fails to resolve this…

Distributed, Parallel, and Cluster Computing · Computer Science 2010-02-24 Xu Liu , Lin Yuan , Jianfeng Zhan , Bibo Tu , Dan Meng

Different from sequential programs, parallel programs possess their own characteristics which are difficult to analyze in the multi-process or multi-thread environment. This paper presents an innovative method to automatically analyze the…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-06-09 Xu Liu , Jianfeng Zhan , Bibo Tu , Ming Zou , Dan Meng

PBScaler: A Bottleneck-aware Autoscaling Framework for Microservice-based Applications

Autoscaling is critical for ensuring optimal performance and resource utilization in cloud applications with dynamic workloads. However, traditional autoscaling technologies are typically no longer applicable in microservice-based…

Software Engineering · Computer Science 2024-04-02 Shuaiyu Xie , Jian Wang , Bing Li , Zekun Zhang , Duantengchuan Li , Patrick C. K. H

Making Applications Faster by Asynchronous Execution: Slowing Down Processes or Relaxing MPI Collectives

Comprehending the performance bottlenecks at the core of the intricate hardware-software interactions exhibited by highly parallel programs on HPC clusters is crucial. This paper sheds light on the issue of automatically asynchronous MPI…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-09-06 Ayesha Afzal , Georg Hager , Stefano Markidis , Gerhard Wellein

Performance Debugging through Microarchitectural Sensitivity and Causality Analysis

Modern Out-of-Order (OoO) CPUs are complex systems with many components interleaved in non-trivial ways. Pinpointing performance bottlenecks and understanding the underlying causes of program performance issues are critical tasks to fully…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-19 Alban Dutilleul , Hugo Pompougnac , Nicolas Derumigny , Gabriel Rodriguez , Valentin Trophime , Christophe Guillon , Fabrice Rastello

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

Parallel computing is a standard approach to achieving high-performance computing (HPC). Three commonly used methods to implement parallel computing include: 1) applying multithreading technology on single-core or multi-core CPUs; 2)…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-18 Xinyao Yi

Automated Neural Architecture Design for Industrial Defect Detection

Industrial surface defect detection (SDD) is critical for ensuring product quality and manufacturing reliability. Due to the diverse shapes and sizes of surface defects, SDD faces two main challenges: intraclass difference and interclass…

Computer Vision and Pattern Recognition · Computer Science 2025-11-27 Yuxi Liu , Yunfeng Ma , Yi Tang , Min Liu , Shuai Jiang , Yaonan Wang

Automated Programmatic Performance Analysis of Parallel Programs

Developing efficient parallel applications is critical to advancing scientific development but requires significant performance analysis and optimization. Performance analysis tools help developers manage the increasing complexity and scale…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-25 Onur Cankur , Aditya Tomar , Daniel Nichols , Connor Scully-Allison , Katherine E. Isaacs , Abhinav Bhatele

Automatic Detection of Performance Anomalies in Task-Parallel Programs

To efficiently exploit the resources of new many-core architectures, integrating dozens or even hundreds of cores per chip, parallel programming models have evolved to expose massive amounts of parallelism, often in the form of fine-grained…

Distributed, Parallel, and Cluster Computing · Computer Science 2014-05-14 Andi Drebes , Karine Heydemann , Antoniu Pop , Albert Cohen , Nathalie Drach

Automap: Towards Ergonomic Automated Parallelism for ML Models

The rapid rise in demand for training large neural network architectures has brought into focus the need for partitioning strategies, for example by using data, model, or pipeline parallelism. Implementing these methods is increasingly…

Machine Learning · Computer Science 2021-12-07 Michael Schaarschmidt , Dominik Grewe , Dimitrios Vytiniotis , Adam Paszke , Georg Stefan Schmid , Tamara Norman , James Molloy , Jonathan Godwin , Norman Alexander Rink , Vinod Nair , Dan Belov

Challenges and Solutions to Build a Data Pipeline to Identify Anomalies in Enterprise System Performance

We discuss how VMware is solving the following challenges to harness data to operate our ML-based anomaly detection system to detect performance issues in our Software Defined Data Center (SDDC) enterprise deployments: (i) label scarcity…

Machine Learning · Computer Science 2021-12-17 Xiaobo Huang , Amitabha Banerjee , Chien-Chia Chen , Chengzhi Huang , Tzu Yi Chuang , Abhishek Srivastava , Razvan Cheveresan

Automatic Identification of Parallelizable Loops Using Transformer-Based Source Code Representations

Automatic parallelization remains a challenging problem in software engineering, particularly in identifying code regions where loops can be safely executed in parallel on modern multi-core architectures. Traditional static analysis…

Software Engineering · Computer Science 2026-04-01 Izavan dos S. Correia , Henrique C. T. Santos , Tiago A. E. Ferreira

SafePlanner: Testing Safety of the Automated Driving System Plan Model

In this work, we present SafePlanner, a systematic testing framework for identifying safety-critical flaws in the Plan model of Automated Driving Systems (ADS). SafePlanner targets two core challenges: generating structurally meaningful…

Software Engineering · Computer Science 2026-01-15 Dohyun Kim , Sanggu Han , Sangmin Woo , Joonha Jang , Jaehoon Kim , Changhun Song , Yongdae Kim

Proactive bottleneck performance analysis in parallel computing using openMP

The aim of parallel computing is to increase an application performance by executing the application on multiple processors. OpenMP is an API that supports multi platform shared memory programming model and shared-memory programs are…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-11-12 Vibha Rajput , Alok Katiyar

Amazon SageMaker Autopilot: a white box AutoML solution at scale

AutoML systems provide a black-box solution to machine learning problems by selecting the right way of processing features, choosing an algorithm and tuning the hyperparameters of the entire pipeline. Although these systems perform well on…

Machine Learning · Computer Science 2020-12-17 Piali Das , Valerio Perrone , Nikita Ivkin , Tanya Bansal , Zohar Karnin , Huibin Shen , Iaroslav Shcherbatyi , Yotam Elor , Wilton Wu , Aida Zolic , Thibaut Lienart , Alex Tang , Amr Ahmed , Jean Baptiste Faddoul , Rodolphe Jenatton , Fela Winkelmolen , Philip Gautier , Leo Dirac , Andre Perunicic , Miroslav Miladinovic , Giovanni Zappella , Cédric Archambeau , Matthias Seeger , Bhaskar Dutt , Laurence Rouesnel

Automatic Discovery of Composite SPMD Partitioning Strategies in PartIR

Large neural network models are commonly trained through a combination of advanced parallelism strategies in a single program, multiple data (SPMD) paradigm. For example, training large transformer models requires combining data, model, and…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-10-20 Sami Alabed , Dominik Grewe , Juliana Franco , Bart Chrzaszcz , Tom Natan , Tamara Norman , Norman A. Rink , Dimitrios Vytiniotis , Michael Schaarschmidt

SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-05-29 Talor Abramovich , Maor Ashkenazi , Izzy Putterman , Benjamin Chislett , Tiyasa Mitra , Bita Darvish Rouhani , Ran Zilberstein , Yonatan Geifman

HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis

Single-Program-Multiple-Data (SPMD) parallelism has recently been adopted to train large deep neural networks (DNNs). Few studies have explored its applicability on heterogeneous clusters, to fully exploit available resources for large…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-12 Shiwei Zhang , Lansong Diao , Chuan Wu , Zongyan Cao , Siyu Wang , Wei Lin

Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines

Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient input pipelines, as it requires reasoning about parallelism,…

Machine Learning · Computer Science 2022-03-22 Michael Kuchnik , Ana Klimovic , Jiri Simsa , Virginia Smith , George Amvrosiadis

SpIDER: Spatially Informed Dense Embedding Retrieval for Software Issue Localization

Retrieving code functions, classes or files that are relevant in order to solve a given user query, bug report or feature request from large codebases is a fundamental challenge for Large Language Model (LLM)-based coding agents. Agentic…

Software Engineering · Computer Science 2026-02-09 Shravan Chaudhari , Rahul Thomas Jacob , Mononito Goswami , Jiajun Cao , Shihab Rashid , Christian Bock