Related papers: TeAAL: A Declarative Framework for Modeling Sparse…
In recent years, many accelerators have been proposed to efficiently process sparse tensor algebra applications (e.g., sparse neural networks). However, these proposals are single points in a large and diverse design space. The lack of…
RTL simulation on CPUs remains a persistent bottleneck in hardware design. State-of-the-art simulators embed the circuit directly into the simulation binary, resulting in long compilation times and execution that is fundamentally CPU…
Recently, numerous sparse hardware accelerators for Deep Neural Networks (DNNs), Graph Neural Networks (GNNs), and scientific computing applications have been proposed. A common characteristic among all of these accelerators is that they…
Sparse tensors are prevalent in many data-intensive applications, yet existing differentiable programming frameworks are tailored towards dense tensors. This presents a significant challenge for efficiently computing gradients through…
Activation sparsity can enable practical inference speedups in large language models (LLMs) by reducing the compute and memory-movement required for matrix multiplications during the forward pass. However, existing methods face limitations…
Tensor algebra finds applications in various domains, and these applications, especially when accelerated on spatial hardware accelerators, can deliver high performance and low power. Spatial hardware accelerator exhibits complex design…
The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios,…
Machine learning (ML) models are widely used in many important domains. For efficiently processing these computational- and memory-intensive applications, tensors of these over-parameterized models are compressed by leveraging sparsity,…
Tensors play a vital role in machine learning (ML) and often exhibit properties best explored while maintaining high-order. Efficiently performing ML computations requires taking advantage of sparsity, but generalized hardware support is…
Tensor algebra is widely used in many applications, such as scientific computing, machine learning, and data analytics. The tensors represented real-world data are usually large and sparse. There are tens of storage formats designed for…
Sparsity, which occurs in both scientific applications and Deep Learning (DL) models, has been a key target of optimization within recent ASIC accelerators due to the potential memory and compute savings. These applications use data stored…
Sparse tensor algebra is a challenging class of workloads to accelerate due to low arithmetic intensity and varying sparsity patterns. Prior sparse tensor algebra accelerators have explored tiling sparse data to increase exploitable data…
Ternary quantization has emerged as a powerful technique for reducing both computational and memory footprint of large language models (LLM), enabling efficient real-time inference deployment without significantly compromising model…
Deep learning (DL) models are piquing high interest and scaling at an unprecedented rate. To this end, a handful of tiled accelerators have been proposed to support such large-scale training tasks. However, these accelerators often…
Sparse models, including sparse Mixture-of-Experts (MoE) models, have emerged as an effective approach for scaling Transformer models. However, they often suffer from computational inefficiency since a significant number of parameters are…
Exploiting sparsity in deep neural networks (DNNs) has been a promising area for meeting the growing computation requirements. To minimize the overhead of sparse acceleration, hardware designers have proposed structured sparsity support,…
Sparse tensor computing is a core computational part of numerous applications in areas such as data science, graph processing, and scientific computing. Sparse tensors offer the potential of skipping unnecessary computations caused by zero…
We introduce DISTAL, a compiler for dense tensor algebra that targets modern distributed and heterogeneous systems. DISTAL lets users independently describe how tensors and computation map onto target machines through separate format and…
Dense and sparse tensors allow the representation of most bulk data structures in computational science applications. We show that sparse tensor algebra can also be used to express many of the transformations on these datasets, especially…
Sparse compiler is a promising solution for sparse tensor algebra optimization. In compiler implementation, reduction in sparse-dense hybrid algebra plays a key role in performance. Though GPU provides various reduction semantics that can…