Related papers: PolyScientist: Automatic Loop Transformations Comb…

PolyDL: Polyhedral Optimizations for Creation of High Performance DL primitives

Deep Neural Networks (DNNs) have revolutionized many aspects of our lives. The use of DNNs is becoming ubiquitous including in softwares for image recognition, speech recognition, speech synthesis, language translation, to name a few. he…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-18 Sanket Tavarageri , Alexander Heinecke , Sasikanth Avancha , Gagandeep Goyal , Ramakrishna Upadrasta , Bharat Kaul

Harnessing Deep Learning and HPC Kernels via High-Level Loop and Tensor Abstractions on CPU Architectures

During the past decade, Deep Learning (DL) algorithms, programming systems and hardware have converged with the High Performance Computing (HPC) counterparts. Nevertheless, the programming methodology of DL and HPC systems is stagnant,…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-03-19 Evangelos Georganas , Dhiraj Kalamkar , Kirill Voronin , Abhisek Kundu , Antonio Noack , Hans Pabst , Alexander Breuer , Alexander Heinecke

AI Powered Compiler Techniques for DL Code Optimization

Creating high performance implementations of deep learning primitives on CPUs is a challenging task. Multiple considerations including multi-level cache hierarchy, and wide SIMD units of CPU platforms influence the choice of program…

Programming Languages · Computer Science 2021-04-13 Sanket Tavarageri , Gagandeep Goyal , Sasikanth Avancha , Bharat Kaul , Ramakrishna Upadrasta

Automatic Kernel Generation for Volta Tensor Cores

A commonly occurring computation idiom in neural networks is to perform some pointwise operations on the result of a matrix multiplication. Such a sequence of operations is typically represented as a computation graph in deep learning…

Programming Languages · Computer Science 2020-08-04 Somashekaracharya G. Bhaskaracharya , Julien Demouth , Vinod Grover

cuDNN: Efficient Primitives for Deep Learning

We present a library of efficient implementations of deep learning primitives. Deep learning workloads are computationally intensive, and optimizing their kernels is difficult and time-consuming. As parallel architectures evolve, kernels…

Neural and Evolutionary Computing · Computer Science 2014-12-19 Sharan Chetlur , Cliff Woolley , Philippe Vandermersch , Jonathan Cohen , John Tran , Bryan Catanzaro , Evan Shelhamer

Deep Clustered Convolutional Kernels

Deep neural networks have recently achieved state of the art performance thanks to new training algorithms for rapid parameter estimation and new regularization methods to reduce overfitting. However, in practice the network architecture…

Machine Learning · Computer Science 2016-03-04 Minyoung Kim , Luca Rigazio

Deep Multiple Kernel Learning

Deep learning methods have predominantly been applied to large artificial neural networks. Despite their state-of-the-art performance, these large networks typically do not generalize well to datasets with limited sample sizes. In this…

Machine Learning · Statistics 2016-11-17 Eric Strobl , Shyam Visweswaran

Library Liberation: Competitive Performance Matmul Through Compiler-composed Nanokernels

The rapidly evolving landscape of AI and machine learning workloads has widened the gap between high-level domain operations and efficient hardware utilization. Achieving near-peak performance still demands deep hardware expertise-experts…

Machine Learning · Computer Science 2025-11-19 Arun Thangamani , Md Asghar Ahmad Shahid , Adam Siemieniuk , Rolf Morel , Renato Golin , Alexander Heinecke

High-Performance Deep Learning via a Single Building Block

Deep learning (DL) is one of the most prominent branches of machine learning. Due to the immense computational cost of DL workloads, industry and academia have developed DL libraries with highly-specialized kernels for each…

Machine Learning · Computer Science 2019-06-19 Evangelos Georganas , Kunal Banerjee , Dhiraj Kalamkar , Sasikanth Avancha , Anand Venkat , Michael Anderson , Greg Henry , Hans Pabst , Alexander Heinecke

Learning Explicit Deep Representations from Deep Kernel Networks

Deep kernel learning aims at designing nonlinear combinations of multiple standard elementary kernels by training deep networks. This scheme has proven to be effective, but intractable when handling large-scale datasets especially when the…

Computer Vision and Pattern Recognition · Computer Science 2018-05-01 Mingyuan Jiu , Hichem Sahbi

TensorIR: An Abstraction for Automatic Tensorized Program Optimization

Deploying deep learning models on various devices has become an important topic. The wave of hardware specialization brings a diverse set of acceleration primitives for multi-dimensional tensor computations. These new acceleration…

Machine Learning · Computer Science 2022-10-31 Siyuan Feng , Bohan Hou , Hongyi Jin , Wuwei Lin , Junru Shao , Ruihang Lai , Zihao Ye , Lianmin Zheng , Cody Hao Yu , Yong Yu , Tianqi Chen

TIRAMISU: A Polyhedral Compiler for Dense and Sparse Deep Learning

In this paper, we demonstrate a compiler that can optimize sparse and recurrent neural networks, both of which are currently outside of the scope of existing neural network compilers (sparse neural networks here stand for networks that can…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-11 Riyadh Baghdadi , Abdelkader Nadir Debbagh , Kamel Abdous , Fatima Zohra Benhamida , Alex Renda , Jonathan Elliott Frankle , Michael Carbin , Saman Amarasinghe

Towards automated kernel selection in machine learning systems: A SYCL case study

Automated tuning of compute kernels is a popular area of research, mainly focused on finding optimal kernel parameters for a problem with fixed input sizes. This approach is good for deploying machine learning models, where the network…

Machine Learning · Computer Science 2020-03-17 John Lawson

Automating Generation of Low Precision Deep Learning Operators

State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and…

Machine Learning · Computer Science 2018-10-29 Meghan Cowan , Thierry Moreau , Tianqi Chen , Luis Ceze

Cortex: A Compiler for Recursive Deep Learning Models

Optimizing deep learning models is generally performed in two steps: (i) high-level graph optimizations such as kernel fusion and (ii) low level kernel optimizations such as those found in vendor libraries. This approach often leaves…

Machine Learning · Computer Science 2021-03-08 Pratik Fegade , Tianqi Chen , Phillip B. Gibbons , Todd C. Mowry

Progress Report: A Deep Learning Guided Exploration of Affine Unimodular Loop Transformations

In this paper, we present a work in progress about a deep learning based approach for automatic code optimization in polyhedral compilers. The proposed technique explores combinations of affine and non-affine loop transformations to find…

Programming Languages · Computer Science 2022-06-09 Massinissa Merouani , Khaled Afif Boudaoud , Iheb Nassim Aouadj , Nassim Tchoulak , Fatima Benbouzid-Sitayeb , Karima Benatchba , Hugh Leather , Riyadh Baghdadi

McKernel: A Library for Approximate Kernel Expansions in Log-linear Time

McKernel introduces a framework to use kernel approximates in the mini-batch setting with Stochastic Gradient Descent (SGD) as an alternative to Deep Learning. Based on Random Kitchen Sinks [Rahimi and Recht 2007], we provide a C++ library…

Machine Learning · Computer Science 2020-04-20 J. D. Curtó , I. C. Zarza , Feng Yang , Alex Smola , Fernando de la Torre , Chong Wah Ngo , Luc van Gool

A Deep Learning Approach To Multiple Kernel Fusion

Kernel fusion is a popular and effective approach for combining multiple features that characterize different aspects of data. Traditional approaches for Multiple Kernel Learning (MKL) attempt to learn the parameters for combining the…

Machine Learning · Statistics 2016-12-30 Huan Song , Jayaraman J. Thiagarajan , Prasanna Sattigeri , Karthikeyan Natesan Ramamurthy , Andreas Spanias

Supervised Multiple Kernel Learning approaches for multi-omics data integration

Advances in high-throughput technologies have originated an ever-increasing availability of omics datasets. The integration of multiple heterogeneous data sources is currently an issue for biology and bioinformatics. Multiple kernel…

Machine Learning · Statistics 2024-12-04 Mitja Briscik , Gabriele Tazza , Marie-Agnes Dillies , László Vidács , Sébastien Dejean

A Unified View of Localized Kernel Learning

Multiple Kernel Learning, or MKL, extends (kernelized) SVM by attempting to learn not only a classifier/regressor but also the best kernel for the training task, usually from a combination of existing kernel functions. Most MKL methods seek…

Machine Learning · Computer Science 2016-03-07 John Moeller , Sarathkrishna Swaminathan , Suresh Venkatasubramanian