Related papers: Triangulating Python Performance Issues with Scale…
Existing profilers for scripting languages (a.k.a. "glue" languages) like Python suffer from numerous problems that drastically limit their usefulness. They impose order-of-magnitude overheads, report information at too coarse a…
Python's flexibility and ease of use come at the cost of performance inefficiencies, requiring developers to rely on profilers to optimize execution. SCALENE, a high-performance CPU, GPU, and memory profiler, provides fine-grained insights…
This paper introduces Sparklen, a statistical learning toolkit for Hawkes processes in Python, designed to bring together efficiency and ease of use. The purpose of this package is to provide the Python community with a complete suite of…
Within the last years, Python became more prominent in the scientific community and is now used for simulations, machine learning, and data analysis. All these tasks profit from additional compute power offered by parallelism and…
As supercomputers continue to grow in scale and capabilities, it is becoming increasingly difficult to isolate processor and system level causes of performance degradation. Over the last several years, a significant number of performance…
Profiling techniques are used extensively at different parts of the computing stack to achieve many goals. One major goal is to make a piece of software execute more efficiently on a specific hardware platform, where efficiency spans…
We present SPDL (Scalable and Performant Data Loading), an open-source, framework-agnostic library designed for efficiently loading array data to GPU. Data loading is often a bottleneck in AI applications, and is challenging to optimize…
Coverage analysis is widely used but can suffer from high overhead. This overhead is especially acute in the context of Python, which is already notoriously slow (a recent study observes a roughly 30x slowdown vs. native code). We find that…
Scaling a parallel program to modern supercomputers is challenging due to inter-process communication, Amdahl's law, and resource contention. Performance analysis tools for finding such scaling bottlenecks either base on profiling or…
We introduce PrivPy, a practical privacy-preserving collaborative computation framework, especially optimized for machine learning tasks. PrivPy provides an easy-to-use and highly compatible Python programming front-end which supports…
In this paper, we present PARTIME, a software library written in Python and based on PyTorch, designed specifically to speed up neural networks whenever data is continuously streamed over time, for both learning and inference. Existing…
Despite being the most popular programming language, Python has not yet received enough attention from the community. To the best of our knowledge, there is no general static analysis framework proposed to facilitate the implementation of…
A growing number of visual computing applications depend on the analysis of large video collections. The challenge is that scaling applications to operate on these datasets requires efficient systems for pixel data access and parallel…
Over the lifetime of a computing task, determining the maximum usage of random-access memory (RAM) on both the motherboard and on a graphical processing unit (GPU), as well as the utilization percentage of the central processing unit (CPU)…
The rapid growth in machine learning models, especially in natural language processing and computer vision, has led to challenges when running these models on hardware with limited resources. This paper introduces Superpipeline, a new…
Python has become the de facto language for scientific computing. Programming in Python is highly productive, mainly due to its rich science-oriented software ecosystem built around the NumPy module. As a result, the demand for Python…
It is a challenging task to train large DNN models on sophisticated GPU platforms with diversified interconnect capabilities. Recently, pipelined training has been proposed as an effective approach for improving device utilization. However,…
Input-sensitive profiling is a recent performance analysis technique that makes it possible to estimate the empirical cost function of individual routines of a program, helping developers understand how performance scales to larger inputs…
Python is a popular programming language known for its ease of learning and extensive libraries. However, concerns about performance and energy consumption have led to the development of compilers to enhance Python code efficiency. Despite…
Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique…