English
Related papers

Related papers: An Open-Source Framework for Efficient Numerically…

200 papers

Neural network (NN) accelerators with multi-chip-module (MCM) architectures enable integration of massive computation capability; however, they face challenges of computing resource underutilization and off-chip communication overheads.…

Hardware Architecture · Computer Science 2026-02-17 Zongle Huang , Hongyang Jia , Kaiwei Zou , Yongpan Liu

Interest in deploying Deep Neural Network (DNN) inference on edge devices has resulted in an explosion of the number and types of hardware platforms to use. While the high-level programming interface, such as TensorFlow, can be readily…

Mathematical Software · Computer Science 2023-03-09 Upasana Sridhar , Nicholai Tukanov , Elliott Binder , Tze Meng Low , Scott McMillan , Martin D. Schatz

Mixed-precision quantization is a promising approach for compressing large language models under tight memory budgets. However, existing mixed-precision methods typically suffer from one of two limitations: they either rely on expensive…

Machine Learning · Computer Science 2026-02-03 Xin Nie , Haicheng Zhang , Liang Dong , Beining Feng , Jinhong Weng , Guiling Sun

This work proposes a novel Deep Neural Network (DNN) quantization framework, namely RMSMP, with a Row-wise Mixed-Scheme and Multi-Precision approach. Specifically, this is the first effort to assign mixed quantization schemes and multiple…

Machine Learning · Computer Science 2021-11-02 Sung-En Chang , Yanyu Li , Mengshu Sun , Weiwen Jiang , Sijia Liu , Yanzhi Wang , Xue Lin

The advancement of Large Language Models (LLMs) has significantly boosted performance in natural language processing (NLP) tasks. However, the deployment of high-performance LLMs incurs substantial costs, primarily due to the increased…

Machine Learning · Computer Science 2024-03-22 Saehan Jo , Immanuel Trummer

As machine learning (ML) algorithms get deployed in an ever-increasing number of applications, these algorithms need to achieve better trade-offs between high accuracy, high throughput and low latency. This paper introduces NASH, a novel…

Machine Learning · Computer Science 2024-03-12 Mengfei Ji , Yuchun Chang , Baolin Zhang , Zaid Al-Ars

Sparse Matrix-Matrix Multiplication (SpMM) is a fundamental computation in graph analytics, scientific simulation, and sparse deep learning workloads. However, the extreme irregularity of real-world sparse matrices prevents existing…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-03-11 Aiying Li , Jingwei Sun , Han Li , Wence Ji , Guangzhong Sun

Many scientific computing problems can be reduced to Matrix-Matrix Multiplications (MMM), making the General Matrix Multiply (GEMM) kernels in the Basic Linear Algebra Subroutine (BLAS) of interest to the high-performance computing…

Hardware Architecture · Computer Science 2023-05-31 Louis Ledoux , Marc Casas

As the increasing complexity of Neural Network(NN) models leads to high demands for computation, AMD introduces a heterogeneous programmable system-on-chip (SoC), i.e., Versal ACAP architectures featured with programmable logic (PL), CPUs,…

Hardware Architecture · Computer Science 2023-05-31 Jinming Zhuang , Zhuoping Yang , Peipei Zhou

Both industry and academia have extensively investigated hardware accelerations. In this work, to address the increasing demands in computational capability and memory requirement, we propose structured weight matrices (SWM)-based…

Distributed, Parallel, and Cluster Computing · Computer Science 2018-05-01 Caiwen Ding , Ao Ren , Geng Yuan , Xiaolong Ma , Jiayu Li , Ning Liu , Bo Yuan , Yanzhi Wang

Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-10-15 Guyue Huang , Guohao Dai , Yu Wang , Yufei Ding , Yuan Xie

Despite the success of deep neural networks (DNNs), state-of-the-art models are too large to deploy on low-resource devices or common server configurations in which multiple models are held in memory. Model compression methods address this…

In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To…

Machine Learning · Computer Science 2023-06-30 Haihao Shen , Hengyu Meng , Bo Dong , Zhe Wang , Ofir Zafrir , Yi Ding , Yu Luo , Hanwen Chang , Qun Gao , Ziheng Wang , Guy Boudoukh , Moshe Wasserblat

Quantization is essential for efficient large language model (LLM) inference, yet the dequantization step-converting low-bit weights back to high-precision for matrix multiplication has become a critical bottleneck on modern AI…

Machine Learning · Statistics 2026-05-15 Lingchao Zheng , Yuwei Fan , Jun Li , Chengqiu Hu , Qichen Liao , Junyi Fan , Rui Shi , Fangzheng Miao

Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning applications. To cope with the high computation demands of these applications, heterogeneous architectures featuring both FPGA and dedicated ASIC…

In recent years, machine learning (ML) and neural networks (NNs) have gained widespread use and attention across various domains, particularly in transportation for achieving autonomy, including the emergence of flying taxis for urban air…

Machine Learning · Computer Science 2024-01-17 Fabien Geyer , Johannes Freitag , Tobias Schulz , Sascha Uhrig

Designing architectures for deep neural networks requires expert knowledge and substantial computation time. We propose a technique to accelerate architecture selection by learning an auxiliary HyperNet that generates the weights of a main…

Machine Learning · Computer Science 2017-08-18 Andrew Brock , Theodore Lim , J. M. Ritchie , Nick Weston

General Matrix Multiplication (GEMM) is a critical operation underpinning a wide range of applications in high-performance computing (HPC) and artificial intelligence (AI). The emergence of hardware optimized for low-precision arithmetic…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-21 Qiao Zhang , Rabab Alomairy , Dali Wang , Zhuowei Gu , Qinglei Cao

This work proposes an energy-efficient resource provisioning and allocation framework to meet the dynamic demands of future applications. The frequent variations in a cloud user's resource demand lead 'to the problem of excess power…

Distributed, Parallel, and Cluster Computing · Computer Science 2022-12-06 Deepika Saxena , Ashutosh Kumar Singh

Precise pointer analysis is a foundational component of many client analyses and optimizations. Scaling flow- and context-sensitive pointer analysis has been a long-standing challenge, suffering from combinatorial growth in both memory…

Programming Languages · Computer Science 2026-04-14 Anamitra Ghorui , Aditi Raste , Uday P. Khedker
‹ Prev 1 2 3 10 Next ›