English
Related papers

Related papers: DVM: A Bytecode Virtual Machine Approach for Dynam…

200 papers

With the widening gap between compute and memory operation latencies, data movement optimizations have become increasingly important for DNN compilation. Current optimizations such as layout transformations and operator fusion only target a…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-04-14 Muyan Hu , Ahan Gupta , Jiachen Yuan , Vima Gupta , Taeksang Kim , Xin Xu , Janardhan Kulkarni , Ofer Dekel , Vikram Adve , Charith Mendis

Dynamic-shape deep neural networks (DNNs) are rapidly evolving, attracting attention for their ability to handle variable input sizes in real-time applications. However, existing compilation optimization methods for such networks often rely…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-09-04 Yangjie Zhou , Honglin Zhu , Qian Qiu , Weihao Cui , Zihan Liu , Cong Guo , Siyuan Feng , Jintao Meng , Haidong Lan , Jingwen Leng , Wenxi Zhu , Minwen Deng

As quantum computing advances towards practical applications, quantum operating systems become inevitable, where multi-programming -- the core functionality of operating systems -- enables concurrent execution of multiple quantum programs…

Quantum Physics · Physics 2025-07-08 Wenjie Sun , Xiaoyu Li , Zhigang Wang , Geng Chen , Lianhui Yu , Guowu Yang

Many recent machine learning models show dynamic shape characteristics. However, existing AI compiler optimization systems suffer a lot from problems brought by dynamic shape models, including compilation overhead, memory usage,…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-24 Kai Zhu , Wenyi Zhao , Zhen Zheng , Tianyou Guo , Pengzhan Zhao , Feiwen Zhu , Junjie Bai , Jun Yang , Xiaoyong Liu , Lansong Diao , Wei Lin

We present DyMU, an efficient, training-free framework that dynamically reduces the computational burden of vision-language models (VLMs) while maintaining high task performance. Our approach comprises two key components. First, Dynamic…

Computer Vision and Pattern Recognition · Computer Science 2025-05-13 Zhenhailong Wang , Senthil Purushwalkam , Caiming Xiong , Silvio Savarese , Heng Ji , Ran Xu

Getting the best performance from the ever-increasing number of hardware platforms has been a recurring challenge for data processing systems. In recent years, the advent of data science with its increasingly numerous and complex types of…

High-performance deep learning depends on efficient tensor programs. In recent years, automatic tensor program optimization, also known as tensor compilation, has emerged as the primary approach to generating efficient tensor programs.…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-02-18 Hangda Liu , Boyu Diao , Yu Yang , Wenxin Chen , Xiaohui Peng , Yongjun Xu

We introduce a practical real-time neural video codec (NVC) designed to deliver high compression ratio, low latency and broad versatility. In practice, the coding speed of NVCs depends on 1) computational costs, and 2) non-computational…

Image and Video Processing · Electrical Eng. & Systems 2025-03-19 Zhaoyang Jia , Bin Li , Jiahao Li , Wenxuan Xie , Linfeng Qi , Houqiang Li , Yan Lu

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen

Support Vector Machine (SVM) algorithm requires a high computational cost (both in memory and time) to solve a complex quadratic programming (QP) optimization problem during the training process. Consequently, SVM necessitates high…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-11-28 Islam Elgarhy

Operator fusion, as a key performance optimization technique in the deployment of AI models, significantly improves execution efficiency and has been widely adopted in modern AI compilers. However, for cascaded reduction operations…

Hardware Architecture · Computer Science 2026-03-12 Xinsheng Tang , Yangcheng Li , Nan Wang , Zhiyi Shu , Xingyu Ling , Junna Xing , Peng Zhou , Qiang Liu

Deep neural networks (DNNs) are of critical use in different domains. To accelerate DNN computation, tensor compilers are proposed to generate efficient code on different domain-specific accelerators. Existing tensor compilers mainly focus…

Machine Learning · Computer Science 2023-07-12 Zixuan Ma , Haojie Wang , Jingze Xing , Liyan Zheng , Chen Zhang , Huanqi Cao , Kezhao Huang , Shizhi Tang , Penghan Wang , Jidong Zhai

We propose a dense tensor accelerator called VectorMesh, a scalable, memory-efficient architecture that can support a wide variety of DNN and computer vision workloads. Its building block is a tile execution unit~(TEU), which includes…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-11-29 Yu-Sheng Lin , Wei-Chao Chen. Chia-Lin Yang , Shao-Yi Chien

In this paper, we introduce an interactive simulator for programs in the form of LLVM bitcode. The main features of the simulator include precise control over thread scheduling, automatic checkpoints and reverse stepping, support for…

Software Engineering · Computer Science 2019-07-10 Petr Ročkai , Jiří Barnat

Non-volatile memory (NVM) provides a scalable and power-efficient solution to replace DRAM as main memory. However, because of relatively high latency and low bandwidth of NVM, NVM is often paired with DRAM to build a heterogeneous memory…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-05-03 Kai Wu , Yingchao Huang , Dong Li

Tensor cores, along with tensor processing units, represent a new form of hardware acceleration specifically designed for deep neural network calculations in artificial intelligence applications. Tensor cores provide extraordinary…

Irregular embedding lookups are a critical bottleneck in recommender models, sparse large language models, and graph learning models. In this paper, we first demonstrate that, by offloading these lookups to specialized access units,…

Operator fusion has become a key optimization for deep learning, which combines multiple deep learning operators to improve data reuse and reduce global memory transfers. However, existing tensor compilers struggle to fuse complex reduction…

Programming Languages · Computer Science 2026-04-21 Yifan Zhao , Egan Johnson , Prasanth Chatarasi , Vikram Adve , Sasa Misailovic

Crossbar-based PIM DNN accelerators can provide massively parallel in-situ operations. A specifically designed compiler is important to achieve high performance for a wide variety of DNN workloads. However, some key compilation issues such…

Hardware Architecture · Computer Science 2023-07-06 Xiaotian Sun , Xinyu Wang , Wanqian Li , Lei Wang , Yinhe Han , Xiaoming Chen

There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new…

‹ Prev 1 2 3 10 Next ›