Related papers: Optimal Kernel Orchestration for Tensor Programs w…

Ansor: Generating High-Performance Tensor Programs for Deep Learning

High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging.…

Machine Learning · Computer Science 2023-10-17 Lianmin Zheng , Chengfan Jia , Minmin Sun , Zhao Wu , Cody Hao Yu , Ameer Haj-Ali , Yida Wang , Jun Yang , Danyang Zhuo , Koushik Sen , Joseph E. Gonzalez , Ion Stoica

Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning

High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Hangda Liu , Boyu Diao , Yu Yang , Wenxin Chen , Xiaohui Peng , Yongjun Xu

Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores

Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision…

Machine Learning · Computer Science 2022-02-25 Junkyeong Choi , Hyucksung Kwon , Woongkyu Lee , Jungwook Choi , Jieun Lim

Learning to Optimize Tensor Programs

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Explore as a Storm, Exploit as a Raindrop: On the Benefit of Fine-Tuning Kernel Schedulers with Coordinate Descent

Machine-learning models consist of kernels, which are algorithms applying operations on tensors -- data indexed by a linear combination of natural numbers. Examples of kernels include convolutions, transpositions, and vectorial products.…

Machine Learning · Computer Science 2024-07-16 Michael Canesche , Gaurav Verma , Fernando Magno Quintao Pereira

DNNFusion: Accelerating Deep Neural Networks Execution with Advanced Operator Fusion

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to…

Machine Learning · Computer Science 2021-12-02 Wei Niu , Jiexiong Guan , Yanzhi Wang , Gagan Agrawal , Bin Ren

Task Scheduling Optimization with Direct Constraints from a Tensor Network Perspective

This work presents a novel method for task optimization in industrial plants using quantum-inspired tensor network technology. This method obtains the best possible combination of tasks on a set of machines with directed constraints while…

Quantum Physics · Physics 2026-04-30 Alejandro Mata Ali , Iñigo Perez Delgado , Beatriz García Markaida , Aitor Moreno Fdez. de Leceta

Acceleration of tensor-product operations for high-order finite element methods

This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving…

Mathematical Software · Computer Science 2017-11-15 Kasia Świrydowicz , Noel Chalmers , Ali Karakus , Timothy Warburton

DLFusion: An Auto-Tuning Compiler for Layer Fusion on Deep Neural Network Accelerator

Many hardware vendors have introduced specialized deep neural networks (DNN) accelerators owing to their superior performance and efficiency. As such, how to generate and optimize the code for the hardware accelerator becomes an important…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-12 Zihan Liu , Jingwen Leng , Quan Chen , Chao Li , Wenli Zheng , Li Li , Minyi Guo

Quantum-based Molecular Dynamics Simulations Using Tensor Cores

Tensor cores, along with tensor processing units, represent a new form of hardware acceleration specifically designed for deep neural network calculations in artificial intelligence applications. Tensor cores provide extraordinary…

Computational Physics · Physics 2021-09-14 Joshua Finkelstein , Justin S. Smith , Susan M. Mniszewski , Kipton Barros , Christian F. A. Negre , Emanuel H. Rubensson , Anders M. N. Niklasson

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Deploying deep learning models on various devices has become an important topic. The wave of hardware specialization brings a diverse set of acceleration primitives for multi-dimensional tensor computations. These new acceleration…

Machine Learning · Computer Science 2022-10-31 Siyuan Feng , Bohan Hou , Hongyi Jin , Wuwei Lin , Junru Shao , Ruihang Lai , Zihao Ye , Lianmin Zheng , Cody Hao Yu , Yong Yu , Tianqi Chen

Can Tensor Cores Benefit Memory-Bound Kernels? (No!)

Tensor cores are specialized processing units within GPUs that have demonstrated significant efficiency gains in compute-bound applications such as Deep Learning Training by accelerating dense matrix operations. Given their success,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Lingqi Zhang , Jiajun Huang , Sheng Di , Satoshi Matsuoka , Mohamed Wahib

MCFuser: High-Performance and Rapid Fusion of Memory-Bound Compute-Intensive Operators

Operator fusion, a key technique to improve data locality and alleviate GPU memory bandwidth pressure, often fails to extend to the fusion of multiple compute-intensive operators due to saturated computation throughput. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-30 Zheng Zhang , Donglin Yang , Xiaobo Zhou , Dazhao Cheng

Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A…

Machine Learning · Computer Science 2022-06-23 Tung Doan , Atsuhiro Takasu

Not Half Bad: Exploring Half-Precision in Graph Convolutional Neural Networks

With the growing significance of graphs as an effective representation of data in numerous applications, efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often…

Machine Learning · Computer Science 2020-10-27 John Brennan , Stephen Bonner , Amir Atapour-Abarghouei , Philip T Jackson , Boguslaw Obara , Andrew Stephen McGough

Optimizing Kernel Machines using Deep Learning

Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring…

Machine Learning · Statistics 2017-11-16 Huan Song , Jayaraman J. Thiagarajan , Prasanna Sattigeri , Andreas Spanias

A Continuous Optimisation Benchmark Suite from Neural Network Regression

Designing optimisation algorithms that perform well in general requires experimentation on a range of diverse problems. Training neural networks is an optimisation task that has gained prominence with the recent successes of deep learning.…

Neural and Evolutionary Computing · Computer Science 2022-09-07 Katherine M. Malan , Christopher W. Cleghorn

Kernel Neural Operators (KNOs) for Scalable, Memory-efficient, Geometrically-flexible Operator Learning

This paper introduces the Kernel Neural Operator (KNO), a provably convergent operator-learning architecture that utilizes compositions of deep kernel-based integral operators for function-space approximation of operators (maps from…

Machine Learning · Computer Science 2026-05-06 Matthew Lowery , John Turnage , Zachary Morrow , John D. Jakeman , Akil Narayan , Shandian Zhe , Varun Shankar

A deep Convolutional Neural Network for topology optimization with strong generalization ability

This paper proposes a deep Convolutional Neural Network(CNN) with strong generalization ability for structural topology optimization. The architecture of the neural network is made up of encoding and decoding parts, which provide down- and…

Machine Learning · Computer Science 2020-04-01 Yiquan Zhang , Bo Peng , Xiaoyi Zhou , Cheng Xiang , Dalei Wang

Optimizing DNN Compilation for Distributed Training with Joint OP and Tensor Fusion

This paper proposes DisCo, an automatic deep learning compilation module for data-parallel distributed training. Unlike most deep learning compilers that focus on training or inference on a single device, DisCo optimizes a DNN model for…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-27 Xiaodong Yi , Shiwei Zhang , Lansong Diao , Chuan Wu , Zhen Zheng , Shiqing Fan , Siyu Wang , Jun Yang , Wei Lin