English
Related papers

Related papers: Optimal Kernel Orchestration for Tensor Programs w…

200 papers

High-performance tensor programs are crucial to guarantee efficient execution of deep neural networks. However, obtaining performant tensor programs for different operators on various hardware platforms is notoriously challenging.…

High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Hangda Liu , Boyu Diao , Yu Yang , Wenxin Chen , Xiaohui Peng , Yongjun Xu

Convolution is one of the fundamental operations of deep neural networks with demanding matrix computation. In a graphic processing unit (GPU), Tensor Core is a specialized matrix processing hardware equipped with reduced-precision…

Machine Learning · Computer Science 2022-02-25 Junkyeong Choi , Hyucksung Kwon , Woongkyu Lee , Jungwook Choi , Jieun Lim

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Machine-learning models consist of kernels, which are algorithms applying operations on tensors -- data indexed by a linear combination of natural numbers. Examples of kernels include convolutions, transpositions, and vectorial products.…

Machine Learning · Computer Science 2024-07-16 Michael Canesche , Gaurav Verma , Fernando Magno Quintao Pereira

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to…

Machine Learning · Computer Science 2021-12-02 Wei Niu , Jiexiong Guan , Yanzhi Wang , Gagan Agrawal , Bin Ren

This work presents a novel method for task optimization in industrial plants using quantum-inspired tensor network technology. This method obtains the best possible combination of tasks on a set of machines with directed constraints while…

This paper is devoted to GPU kernel optimization and performance analysis of three tensor-product operators arising in finite element methods. We provide a mathematical background to these operations and implementation details. Achieving…

Mathematical Software · Computer Science 2017-11-15 Kasia Świrydowicz , Noel Chalmers , Ali Karakus , Timothy Warburton

Many hardware vendors have introduced specialized deep neural networks (DNN) accelerators owing to their superior performance and efficiency. As such, how to generate and optimize the code for the hardware accelerator becomes an important…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-12 Zihan Liu , Jingwen Leng , Quan Chen , Chao Li , Wenli Zheng , Li Li , Minyi Guo

Tensor cores, along with tensor processing units, represent a new form of hardware acceleration specifically designed for deep neural network calculations in artificial intelligence applications. Tensor cores provide extraordinary…

Deploying deep learning models on various devices has become an important topic. The wave of hardware specialization brings a diverse set of acceleration primitives for multi-dimensional tensor computations. These new acceleration…

Machine Learning · Computer Science 2022-10-31 Siyuan Feng , Bohan Hou , Hongyi Jin , Wuwei Lin , Junru Shao , Ruihang Lai , Zihao Ye , Lianmin Zheng , Cody Hao Yu , Yong Yu , Tianqi Chen

Tensor cores are specialized processing units within GPUs that have demonstrated significant efficiency gains in compute-bound applications such as Deep Learning Training by accelerating dense matrix operations. Given their success,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-03-04 Lingqi Zhang , Jiajun Huang , Sheng Di , Satoshi Matsuoka , Mohamed Wahib

Operator fusion, a key technique to improve data locality and alleviate GPU memory bandwidth pressure, often fails to extend to the fusion of multiple compute-intensive operators due to saturated computation throughput. However, the…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-30 Zheng Zhang , Donglin Yang , Xiaobo Zhou , Dazhao Cheng

Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A…

Machine Learning · Computer Science 2022-06-23 Tung Doan , Atsuhiro Takasu

With the growing significance of graphs as an effective representation of data in numerous applications, efficient graph analysis using modern machine learning is receiving a growing level of attention. Deep learning approaches often…

Building highly non-linear and non-parametric models is central to several state-of-the-art machine learning systems. Kernel methods form an important class of techniques that induce a reproducing kernel Hilbert space (RKHS) for inferring…

Machine Learning · Statistics 2017-11-16 Huan Song , Jayaraman J. Thiagarajan , Prasanna Sattigeri , Andreas Spanias

Designing optimisation algorithms that perform well in general requires experimentation on a range of diverse problems. Training neural networks is an optimisation task that has gained prominence with the recent successes of deep learning.…

Neural and Evolutionary Computing · Computer Science 2022-09-07 Katherine M. Malan , Christopher W. Cleghorn

This paper introduces the Kernel Neural Operator (KNO), a provably convergent operator-learning architecture that utilizes compositions of deep kernel-based integral operators for function-space approximation of operators (maps from…

Machine Learning · Computer Science 2026-05-06 Matthew Lowery , John Turnage , Zachary Morrow , John D. Jakeman , Akil Narayan , Shandian Zhe , Varun Shankar

This paper proposes a deep Convolutional Neural Network(CNN) with strong generalization ability for structural topology optimization. The architecture of the neural network is made up of encoding and decoding parts, which provide down- and…

Machine Learning · Computer Science 2020-04-01 Yiquan Zhang , Bo Peng , Xiaoyi Zhou , Cheng Xiang , Dalei Wang

This paper proposes DisCo, an automatic deep learning compilation module for data-parallel distributed training. Unlike most deep learning compilers that focus on training or inference on a single device, DisCo optimizes a DNN model for…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-09-27 Xiaodong Yi , Shiwei Zhang , Lansong Diao , Chuan Wu , Zhen Zheng , Shiqing Fan , Siyu Wang , Jun Yang , Wei Lin
‹ Prev 1 2 3 10 Next ›