Related papers: DJXPerf: Identifying Memory Inefficiencies via Obj…
Many performance inefficiencies such as inappropriate choice of algorithms or data structures, developers' inattention to performance, and missed compiler optimizations show up as wasteful memory operations. Wasteful memory operations are…
Memory bloat is an important source of inefficiency in complex production software, especially in software written in managed languages such as Java. Prior approaches to this problem have focused on identifying objects that outlive their…
The cache plays a key role in determining the performance of applications, no matter for sequential or concurrent programs on homogeneous and heterogeneous architecture. Fixing cache misses requires to understand the origin and the type of…
Low latency services such as credit-card fraud detection and website targeted advertisement rely on Big Data platforms (e.g., Lucene, Graphchi, Cassandra) which run on top of memory managed runtimes, such as the JVM. These platforms,…
A processor's memory hierarchy has a major impact on the performance of running code. However, computing platforms, where the actual hardware characteristics are hidden from both the end user and the tools that mediate execution, such as a…
Recently Java programming environment has become so popular. Java programming language is a language that is designed to be portable enough to be executed in wide range of computers ranging from cell phones to supercomputers. Computer…
Improving software performance is an important yet challenging part of the software development cycle. Today, the majority of performance inefficiencies are identified and patched by performance experts. Recent advancements in deep learning…
Profiling tools (also known as profilers) play an important role in understanding program performance at runtime, such as hotspots, bottlenecks, and inefficiencies. While profilers have been proven to be useful, they give extra burden to…
Software design patterns are standard solutions to common problems in software design and architecture. Knowing that a particular module implements a design pattern is a shortcut to design comprehension. Manually detecting design patterns…
Data structures are a cornerstone of most modern programming languages. Whether they are provided via separate libraries, built into the language specification, or as part of the language's standard library -- data structures such as lists,…
Trying to cope with the constantly growing number of cores per processor, hardware architects are experimenting with modular non-cache-coherent architectures. Such architectures delegate the memory coherency to the software. On the…
Object-oriented Programming has become one of the most dominant design paradigms as the separation of concerns and adaptability of design reduce development and maintenance costs. However, the convenience is not without cost. The added…
Software applications, especially Enterprise Resource Planning (ERP) systems, are crucial to the day-to-day operations of many industries. Therefore, it is essential to maintain these systems effectively using tools that can identify,…
This paper introduces EdgeProfiler, a fast profiling framework designed for evaluating lightweight Large Language Models (LLMs) on edge systems. While LLMs offer remarkable capabilities in natural language understanding and generation,…
Data prefetching aims to improve access times to data storage systems by predicting data records that are likely to be accessed by subsequent requests and retrieving them into a memory cache before they are needed. In the case of Persistent…
Effective performance profiling and analysis are essential for optimizing training and inference of deep learning models, especially given the growing complexity of heterogeneous computing environments. However, existing tools often lack…
MLPerf, an emerging machine learning benchmark suite strives to cover a broad range of applications of machine learning. We present a study on its characteristics and how the MLPerf benchmarks differ from some of the previous deep learning…
Parallel applications are extremely challenging to achieve the optimal performance on the NUMA architecture, which necessitates the assistance of profiling tools. However, existing NUMA-profiling tools share some similar shortcomings, such…
Hyperscalars run services across a large fleet of servers, serving billions of users worldwide. These services, however, behave differently than commonly available benchmark suites, resulting in server architectures that are not optimized…
Energy consumption is a growing concern in several fields, from mobile devices to large data centers. Developers need detailed data on the energy consumption of their software to mitigate consumption issues. Previous approaches have a…