English
Related papers

Related papers: Autotuning Apache TVM-based Scientific Application…

200 papers

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new…

We implemented and optimized matrix multiplications between dense and block-sparse matrices on CUDA. We leveraged TVM, a deep learning compiler, to explore the schedule space of the operation and generate efficient CUDA code. With the…

Mathematical Software · Computer Science 2020-07-28 Zijing Gu

Processing-in-DRAM (DRAM-PIM) has emerged as a promising technology for accelerating memory-intensive operations in modern applications, such as Large Language Models (LLMs). Despite its potential, current software stacks for DRAM-PIM face…

Hardware Architecture · Computer Science 2025-06-03 Yongwon Shin , Dookyung Kang , Hyojin Sung

Specialized accelerators for tensor-operations, such as blocked-matrix operations and multi-dimensional convolutions, have been emerged as powerful architecture choices for high-performance Deep-Learning computing. The rapid development of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-24 Dionysios Diamantopoulos , Burkhard Ringlein , Mitra Purandare , Gagandeep Singh , Christoph Hagleitner

Machine Learning compilers like TVM allow a fast and flexible deployment on embedded CPUs. This enables the use of non-standard operators, which are common in ML compression techniques. However, it is necessary to understand the limitations…

Hardware Architecture · Computer Science 2021-02-02 Bernhard Klein , Christoph Gratl , Manfred Mücke , Holger Fröning

We explore the utilization of the Apache TVM open source framework to automatically generate a family of algorithms that follow the approach taken by popular linear algebra libraries, such as GotoBLAS2, BLIS and OpenBLAS, in order to obtain…

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML)…

Machine Learning · Computer Science 2025-07-08 Samira Ahmadifarsani , Daniel Mueller-Gritschneder , Ulf Schlichtmann

In recent years, general matrix-matrix multiplication with non-regular-shaped input matrices has been widely used in many applications like deep learning and has drawn more and more attention. However, conventional implementations are not…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-24 Chendi Li , Haipeng Jia , Hang Cao , Jianyu Yao , Boqian Shi , Chunyang Xiang , Jinbo Sun , Pengqi Lu , Yunquan Zhang

An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application by selecting and evaluating a subset of implementations/configurations on a target platform and/or use models…

Performance · Computer Science 2020-10-19 Xingfu Wu , Michael Kruse , Prasanna Balaprakash , Hal Finkel , Paul Hovland , Valerie Taylor , Mary Hall

Autonomous tuning of particle accelerators is an active and challenging field of research with the goal of enabling novel accelerator technologies cutting-edge high-impact applications, such as physics discovery, cancer research and…

Computation and Language · Computer Science 2024-05-16 Jan Kaiser , Annika Eichler , Anne Lauscher

Automatic performance tuning (auto-tuning) is essential for optimizing high-performance applications, where vast and irregular search spaces make manual exploration infeasible. While auto-tuners traditionally rely on classical approaches…

Machine Learning · Computer Science 2026-04-01 Floris-Jan Willemsen , Niki van Stein , Ben van Werkhoven

In this paper, we develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within Bayesian optimization and evaluate their…

Machine Learning · Computer Science 2021-04-28 Xingfu Wu , Michael Kruse , Prasanna Balaprakash , Hal Finkel , Paul Hovland , Valerie Taylor , Mary Hall

Bayesian optimization is a powerful method for automating tuning of compilers. The complex landscape of autotuning provides a myriad of rarely considered structural challenges for black-box optimizers, and the lack of standardized…

Machine Learning · Computer Science 2025-04-09 Jacob O. Tørring , Carl Hvarfner , Luigi Nardi , Magnus Själander

The deployment of neural networks on heterogeneous SoCs coupled with custom accelerators is a challenging task because of the lack of end-to-end software tools provided for these systems. Moreover, the already available low level schedules…

Machine Learning · Computer Science 2024-06-11 F. N. Peccia , O. Bringmann

Pipelining between data loading and computation is a critical tensor program optimization for GPUs. In order to unleash the high performance of latest GPUs, we must perform a synergetic optimization of multi-stage pipelining across the…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-05-09 Guyue Huang , Yang Bai , Liu Liu , Yuke Wang , Bei Yu , Yufei Ding , Yuan Xie

Tensor contraction operations in computational chemistry consume significant fractions of computing time on large-scale computing platforms. The widespread use of tensor contractions between large multi-dimensional tensors in describing…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-11 Erdal Mutlu , Ajay Panyala , Nitin Gawande , Abhishek Bagusetty , Jinsung Kim , Karol Kowalski , Nicholas Bauman , Bo Peng , Jiri Brabec , Sriram Krishnamoorthy

Support Vector Machine (SVM) is a state-of-the-art classification method widely used in science and engineering due to its high accuracy, its ability to deal with high dimensional data, and its flexibility in modeling diverse sources of…

Machine Learning · Computer Science 2024-09-30 Xingfu Wu , Tupendra Oli , Justin H. Qian , Valerie Taylor , Mark C. Hersam , Vinod K. Sangwan

Sparse matrices are an integral part of scientific simulations. As hardware evolves new sparse matrix storage formats are proposed aiming to exploit optimizations specific to the new hardware. In the era of heterogeneous computing, users…

Machine Learning · Computer Science 2023-03-10 Christodoulos Stylianou , Michele Weiland

Large language models have high compute, latency, and memory requirements. While specialized accelerators such as GPUs and TPUs typically run these workloads, CPUs are more widely available and consume less energy. Accelerating LLMs with…

RISC-V provides a flexible and scalable platform for applications ranging from embedded devices to high-performance computing clusters. Particularly, its RISC-V Vector Extension (RVV) becomes of interest for the acceleration of AI…

Machine Learning · Computer Science 2025-08-20 Federico Nicolas Peccia , Frederik Haxel , Oliver Bringmann
‹ Prev 1 2 3 10 Next ›