Related papers: A Call-Graph Profiler for GNU Octave
This paper focuses on the explanation of the architecture of profilers particularly gprof and how to profile a program according to the user defined input of execution time . Gprof is a profiler available open source in the package of…
This paper discusses the problems faced with interoperability between two programming languages, with respect to GNU Octave, and GTK API written in C, to provide the GTK API on Octave.Octave-GTK is the fusion of two different API's: one…
Tcl/tk provides for fast and flexible interface design but slow and cumbersome vector processing. Octave provides fast and flexible vector processing but slow and cumbersome interface design. Calling octave from tcl gives you the…
The vector notation adopted by GNU Octave plays a significant role as a tool for introspection, aligning itself with the vision of Kenneth E. Iverson. He believed that, just like mathematics, a programming language should be an effective…
For instruments with many occasional users, it is important to have easy to use software. To support the frequent users it is important to be flexible. Using a scripting language to design a GUI and exposing it to the user allows us to do…
Profile Guided Optimization (PGO) uses runtime profiling to direct compiler optimization decisions, effectively combining static analysis with actual execution behavior to enhance performance. Runtime profiles, collected through…
This paper introduces a design method for densergraph-frequency graph Fourier frames (DGFFs) to enhance graph signal processing and analysis. The graph Fourier transform (GFT) enables us to analyze graph signals in the graph spectral domain…
Analyzing large-scale graphs provides valuable insights in different application scenarios. While many graph processing systems working on top of distributed infrastructures have been proposed to deal with big graphs, the tasks of profiling…
Understanding the behavior of simulated architectures in gem5 is critical for studying complex, deeply integrated computing systems. However, conventional analysis methods provide only an indirect view of the simulated system internals. In…
Memory profiling captures programs' dynamic memory behavior, assisting programmers in debugging, tuning, and enabling advanced compiler optimizations like speculation-based automatic parallelization. As each use case demands its unique…
We introduce qprof, a new and extensible quantum program profiler able to generate profiling reports of various quantum circuits. We describe the internal structure and working of qprof and provide three practical examples on practical…
Measurements of microscale surface patterns are essential for process and quality control in industries across semiconductors, micro-machining, and biomedicines. However, the development of miniaturized and intelligent profiling systems…
We present Graphite, a GPU-accelerated nonlinear least squares graph optimization framework. It provides a CUDA C++ interface to enable the sharing of code between a real-time application, such as a SLAM system, and its optimization tasks.…
This paper presents GraphAGILE, a domain-specific FPGA-based overlay accelerator for graph neural network (GNN) inference. GraphAGILE consists of (1) \emph{a novel unified architecture design} with an \emph{instruction set}, and (2) \emph{a…
Profiling techniques are used extensively at different parts of the computing stack to achieve many goals. One major goal is to make a piece of software execute more efficiently on a specific hardware platform, where efficiency spans…
Graph algorithms are increasingly used in applications that exploit large databases. However, conventional processor architectures are inadequate for handling the throughput and memory requirements of graph computation. Lincoln Laboratory's…
High-performance GPU kernel optimization remains a critical yet labor-intensive task in modern machine learning workloads. Although Triton, a domain-specific language for GPU programming, enables developers to write efficient kernels with…
We propose a new graph-theoretic benchmark in this paper. The benchmark is developed to address shortcomings of an existing widely-used graph benchmark. We thoroughly studied a large number of traditional and contemporary graph algorithms…
We present Forge-UGC (FX Optimization and Register-Graph Engine for Universal Graph Compilation), a four-phase compiler for transformer deployment on heterogeneous accelerator hardware, validated on Intel AI Boost NPU. Existing frameworks…
Applied research in graph algorithms and combinatorial structures needs comprehensive and versatile software libraries. However, the design and the implementation of flexible libraries are challenging activities. Among the other problems…