Related papers: Vortex: Efficient Sample-Free Dynamic Tensor Progr…

VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination

With the widening gap between compute and memory operation latencies, data movement optimizations have become increasingly important for DNN compilation. Current optimizations such as layout transformations and operator fusion only target a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Muyan Hu , Ahan Gupta , Jiachen Yuan , Vima Gupta , Taeksang Kim , Xin Xu , Janardhan Kulkarni , Ofer Dekel , Vikram Adve , Charith Mendis

VORTEX: Challenging CNNs at Texture Recognition by using Vision Transformers with Orderless and Randomized Token Encodings

Texture recognition has recently been dominated by ImageNet-pre-trained deep Convolutional Neural Networks (CNNs), with specialized modifications and feature engineering required to achieve state-of-the-art (SOTA) performance. However,…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Leonardo Scabini , Kallil M. Zielinski , Emir Konuk , Ricardo T. Fares , Lucas C. Ribas , Kevin Smith , Odemir M. Bruno

DVM: A Bytecode Virtual Machine Approach for Dynamic Tensor Computation

Dynamism is common in AI computation, e.g., the dynamic tensor shapes and the dynamic control flows in models. Due to the long compilation time, existing runtime compilation damages the model efficiency, while the offline compilers either…

Programming Languages · Computer Science 2026-04-03 Jingzhi Fang , Xiong Gao , Renwei Zhang , Zichun Ye , Lei Chen , Jie Zhao , Chengnuo Huang , Hui Xu , Xuefeng Jin

Accelerating shape optimization by deep neural networks with on-the-fly determined architecture

In component shape optimization, the component properties are often evaluated by computationally expensive simulations. Such optimization becomes unfeasible when it is focused on a global search requiring thousands of simulations to be…

Computational Engineering, Finance, and Science · Computer Science 2025-12-08 Lucie Kubíčková , Onřej Gebouský , Jan Haidl , Martin Isoz

VORTEX: Physics-Driven Data Augmentations Using Consistency Training for Robust Accelerated MRI Reconstruction

Deep neural networks have enabled improved image quality and fast inference times for various inverse problems, including accelerated magnetic resonance imaging (MRI) reconstruction. However, such models require a large number of…

Image and Video Processing · Electrical Eng. & Systems 2022-06-20 Arjun D Desai , Beliz Gunel , Batu M Ozturkler , Harris Beg , Shreyas Vasanawala , Brian A Hargreaves , Christopher Ré , John M Pauly , Akshay S Chaudhari

Dynamically Throttleable Neural Networks (TNN)

Conditional computation for Deep Neural Networks (DNNs) reduce overall computational load and improve model accuracy by running a subset of the network. In this work, we present a runtime throttleable neural network (TNN) that can…

Machine Learning · Computer Science 2020-11-06 Hengyue Liu , Samyak Parajuli , Jesse Hostetler , Sek Chai , Bir Bhanu

oneDNN Graph Compiler: A Hybrid Approach for High-Performance Deep Learning Compilation

With the rapid development of deep learning models and hardware support for dense computing, the deep learning workload characteristics changed significantly from a few hot spots on compute-intensive operations to a broad range of…

Machine Learning · Computer Science 2024-03-12 Jianhui Li , Zhennan Qin , Yijie Mei , Jingze Cui , Yunfei Song , Ciyong Chen , Yifei Zhang , Longsheng Du , Xianhang Cheng , Baihui Jin , Yan Zhang , Jason Ye , Eric Lin , Dan Lavery

PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus…

Machine Learning · Computer Science 2023-07-12 Zixuan Ma , Haojie Wang , Jingze Xing , Liyan Zheng , Chen Zhang , Huanqi Cao , Kezhao Huang , Shizhi Tang , Penghan Wang , Jidong Zhai

Accelerated Volumetric Compression without Hierarchies: A Fourier Feature Based Implicit Neural Representation Approach

Volumetric data compression is critical in fields like medical imaging, scientific simulation, and entertainment. We introduce a structure-free neural compression method combining Fourierfeature encoding with selective voxel sampling,…

Computer Vision and Pattern Recognition · Computer Science 2025-08-13 Leona Žůrková , Petr Strakoš , Michal Kravčenko , Tomáš Brzobohatý , Lubomír Říha

An End-to-End HW/SW Co-Design Methodology to Design Efficient Deep Neural Network Systems using Virtual Models

End-to-end performance estimation and measurement of deep neural network (DNN) systems become more important with increasing complexity of DNN systems consisting of hardware and software components. The methodology proposed in this paper…

Machine Learning · Computer Science 2019-11-19 Michael J. Klaiber , Sebastian Vogel , Axel Acosta , Robert Korn , Leonardo Ecco , Kristine Back , Andre Guntoro , Ingo Feldner

Nimble: Efficiently Compiling Dynamic Neural Networks for Model Inference

Modern deep neural networks increasingly make use of features such as dynamic control flow, data structures and dynamic tensor shapes. Existing deep learning systems focus on optimizing and executing static neural networks which assume a…

Programming Languages · Computer Science 2021-03-15 Haichen Shen , Jared Roesch , Zhi Chen , Wei Chen , Yong Wu , Mu Li , Vin Sharma , Zachary Tatlock , Yida Wang

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

Achieving faster execution with shorter compilation time can foster further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional…

Machine Learning · Computer Science 2020-01-27 Byung Hoon Ahn , Prannoy Pilligundla , Amir Yazdanbakhsh , Hadi Esmaeilzadeh

An efficient and flexible inference system for serving heterogeneous ensembles of deep neural networks

Ensembles of Deep Neural Networks (DNNs) have achieved qualitative predictions but they are computing and memory intensive. Therefore, the demand is growing to make them answer a heavy workload of requests with available computational…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-08-31 Pierrick Pochelu , Serge G. Petiton , Bruno Conche

A Programmable Approach to Neural Network Compression

Deep neural networks (DNNs) frequently contain far more weights, represented at a higher precision, than are required for the specific task which they are trained to perform. Consequently, they can often be compressed using techniques such…

Machine Learning · Computer Science 2020-12-03 Vinu Joseph , Saurav Muralidharan , Animesh Garg , Michael Garland , Ganesh Gopalakrishnan

Gensor: A Graph-based Construction Tensor Compilation Method for Deep Learning

High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Hangda Liu , Boyu Diao , Yu Yang , Wenxin Chen , Xiaohui Peng , Yongjun Xu

BladeDISC++: Memory Optimizations Based On Symbolic Shape

Recent deep learning workloads exhibit dynamic characteristics, leading to the rising adoption of dynamic shape compilers. These compilers can generate efficient kernels for dynamic shape graphs characterized by a fixed graph topology and…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-12-24 Xiulong Yuan , Xu Yan , Wenting Shen , Xiafei Qiu , Ang Wang , Jie Zhang , Yong Li , Wei Lin

DynaNav: Dynamic Feature and Layer Selection for Efficient Visual Navigation

Visual navigation is essential for robotics and embodied AI. However, existing foundation models, particularly those with transformer decoders, suffer from high computational overhead and lack interpretability, limiting their deployment in…

Computer Vision and Pattern Recognition · Computer Science 2025-09-29 Jiahui Wang , Changhao Chen

Reinforcement Learning and Adaptive Sampling for Optimized DNN Compilation

Achieving faster execution with shorter compilation time can enable further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional…

Machine Learning · Computer Science 2019-05-31 Byung Hoon Ahn , Prannoy Pilligundla , Hadi Esmaeilzadeh

MetaML-Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration

This paper presents a unified framework for codifying and automating optimization strategies to efficiently deploy deep neural networks (DNNs) on resource-constrained hardware, such as FPGAs, while maintaining high performance, accuracy,…

Hardware Architecture · Computer Science 2026-02-11 Zhiqiang Que , Jose G. F. Coutinho , Ce Guo , Hongxiang Fan , Wayne Luk

PERTINENCE: Input-based Opportunistic Neural Network Dynamic Execution

Deep neural networks (DNNs) have become ubiquitous thanks to their remarkable ability to model complex patterns across various domains such as computer vision, speech recognition, robotics, etc. While large DNN models are often more…

Machine Learning · Computer Science 2025-11-18 Omkar Shende , Gayathri Ananthanarayanan , Marcello Traiola