English
Related papers

Related papers: Cortex: A Compiler for Recursive Deep Learning Mod…

200 papers

Creating high performance implementations of deep learning primitives on CPUs is a challenging task. Multiple considerations including multi-level cache hierarchy, and wide SIMD units of CPU platforms influence the choice of program…

Programming Languages · Computer Science 2021-04-13 Sanket Tavarageri , Gagandeep Goyal , Sasikanth Avancha , Bharat Kaul , Ramakrishna Upadrasta

This paper proposes an adaptive neural-compilation framework to address the problem of efficient program learning. Traditional code optimisation strategies used in compilers are based on applying pre-specified set of transformations that…

Artificial Intelligence · Computer Science 2016-05-27 Rudy Bunel , Alban Desmaison , Pushmeet Kohli , Philip H. S. Torr , M. Pawan Kumar

State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing model sizes and computational complexity. Deploying these models on low power and…

Machine Learning · Computer Science 2018-10-29 Meghan Cowan , Thierry Moreau , Tianqi Chen , Luis Ceze

Recurrence equations lie at the heart of many computational paradigms including dynamic programming, graph analysis, and linear solvers. These equations are often expensive to compute and much work has gone into optimizing them for…

Programming Languages · Computer Science 2023-09-12 Shiv Sundram , Muhammad Usman Tariq , Fredrik Kjolstad

Because loops execute their body many times, compiler developers place much emphasis on their optimization. Nevertheless, in view of highly diverse source code and hardware, compilers still struggle to produce optimal target code. The sheer…

Programming Languages · Computer Science 2021-03-01 Rahim Mammadli , Marija Selakovic , Felix Wolf , Michael Pradel

Modern computationally-intensive applications often operate under time constraints, necessitating acceleration methods and distribution of computational workloads across multiple entities. However, the outcome is either achieved within the…

Information Theory · Computer Science 2024-02-13 Homa Esfahanizadeh , Alejandro Cohen , Shlomo Shamai , Muriel Medard

High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Hangda Liu , Boyu Diao , Yu Yang , Wenxin Chen , Xiaohui Peng , Yongjun Xu

At the heart of deep learning training and inferencing are computationally intensive primitives such as convolutions which form the building blocks of deep neural networks. Researchers have taken two distinct approaches to creating high…

Programming Languages · Computer Science 2020-02-07 Sanket Tavarageri , Alexander Heinecke , Sasikanth Avancha , Gagandeep Goyal , Ramakrishna Upadrasta , Bharat Kaul

Selecting the right compiler optimisations has a severe impact on programs' performance. Still, the available optimisations keep increasing, and their effect depends on the specific program, making the task human intractable. Researchers…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-12 Stefano Cereda , Gianluca Palermo , Paolo Cremonesi , Stefano Doni

In-memory computing (IMC) with single instruction multiple data (SIMD) setup enables memory to perform operations on the stored data in parallel to achieve high throughput and energy saving. To instruct a SIMD IMC hardware to compute a…

Emerging Technologies · Computer Science 2024-12-04 Xingyue Qian , Chenyang Lv , Zhezhi He , Weikang Qian

Deep learning (DL) compilers rely on cost models and auto-tuning to optimize tensor programs for target hardware. However, existing approaches depend on large offline datasets, incurring high collection costs and offering suboptimal…

Machine Learning · Computer Science 2026-04-15 Chaoyao Shen , Linfeng Jiang , Yixian Shen , Tao Xu , Guoqing Li , Anuj Pathania , Andy D. Pimentel , Meng Zhang

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen

We introduce a learning-based framework to optimize tensor programs for deep learning workloads. Efficient implementations of tensor operators, such as matrix multiplication and high dimensional convolution, are key enablers of effective…

Machine Learning · Computer Science 2019-01-10 Tianqi Chen , Lianmin Zheng , Eddie Yan , Ziheng Jiang , Thierry Moreau , Luis Ceze , Carlos Guestrin , Arvind Krishnamurthy

Pre-trained language models have shown stellar performance in various downstream tasks. But, this usually comes at the cost of high latency and computation, hindering their usage in resource-limited settings. In this work, we propose a…

Computation and Language · Computer Science 2022-03-18 Ali Modarressi , Hosein Mohebbi , Mohammad Taher Pilehvar

Accurate hardware performance models are critical to efficient code generation. They can be used by compilers to make heuristic decisions, by superoptimizers as a minimization objective, or by autotuners to find an optimal configuration for…

In this paper, we demonstrate a compiler that can optimize sparse and recurrent neural networks, both of which are currently outside of the scope of existing neural network compilers (sparse neural networks here stand for networks that can…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-11 Riyadh Baghdadi , Abdelkader Nadir Debbagh , Kamel Abdous , Fatima Zohra Benhamida , Alex Renda , Jonathan Elliott Frankle , Michael Carbin , Saman Amarasinghe

Deep learning applications are usually very compute-intensive and require a long run time for training and inference. This has been tackled by researchers from both hardware and software sides, and in this paper, we propose a Roofline-based…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-09-24 Yunsong Wang , Charlene Yang , Steven Farrell , Yan Zhang , Thorsten Kurth , Samuel Williams

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus…

Machine Learning · Computer Science 2023-07-12 Zixuan Ma , Haojie Wang , Jingze Xing , Liyan Zheng , Chen Zhang , Huanqi Cao , Kezhao Huang , Shizhi Tang , Penghan Wang , Jidong Zhai

Deep learning methods have predominantly been applied to large artificial neural networks. Despite their state-of-the-art performance, these large networks typically do not generalize well to datasets with limited sample sizes. In this…

Machine Learning · Statistics 2016-11-17 Eric Strobl , Shyam Visweswaran

We present a prototypical linear algebra compiler that automatically exploits domain-specific knowledge to generate high-performance algorithms. The input to the compiler is a target equation together with knowledge of both the structure of…

Mathematical Software · Computer Science 2012-05-29 Diego Fabregat-Traver , Paolo Bientinesi
‹ Prev 1 2 3 10 Next ›