Related papers: Efficient Sparse Processing-in-Memory Architecture…
Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…
Several manufacturers have already started to commercialize near-bank Processing-In-Memory (PIM) architectures. Near-bank PIM architectures place simple cores close to DRAM banks and can yield significant performance and energy improvements…
In recent years, Transformer-based language models have become the standard approach for natural language processing tasks. However, stringent throughput and latency requirements in industrial applications are limiting their adoption. To…
Sparse deep learning has reduced computation significantly, but its irregular non-zero data distribution complicates the data flow and hinders data reuse, increasing on-chip SRAM access and thus power consumption of the chip. This paper…
Sparse matrix-vector and matrix-matrix multiplication (SpMV and SpMM) are fundamental in both conventional (graph analytics, scientific computing) and emerging (sparse DNN, GNN) domains. Workload-balancing and parallel-reduction are…
Sparse matrix-dense matrix multiplication (SpMM) is a critical kernel in scientific computing, graph analytics, and machine learning, whose performance is often constrained by memory bandwidth. In this work, we investigate the applicability…
Sparse matrix multiplication is traditionally performed in memory and scales to large matrices using the distributed memory of multiple nodes. In contrast, we scale sparse matrix multiplication beyond memory capacity by implementing sparse…
Structured sparsity enables deploying large language models (LLMs) on resource-constrained systems. Approaches like dense-to-sparse fine-tuning are particularly compelling, achieving remarkable structured sparsity by reducing the model size…
Sparse matrix-matrix multiplication (SpGEMM) is a critical operation in numerous fields, including scientific computing, graph analytics, and deep learning. These applications exploit the sparsity of matrices to reduce storage and…
Deep learning demonstrates effectiveness across a wide range of tasks. However, the dense and over-parameterized nature of these models results in significant resource consumption during deployment. In response to this issue, weight…
The importance of general matrix multiplication (GEMM) is motivating new instruction set extensions for multiplying dense matrices in almost all contemporary ISAs, and these extensions are often implemented using high-performance systolic…
Processing-in-memory (PIM) is a transformative architectural paradigm designed to overcome the Von Neumann bottleneck. Among PIM architectures, digital SRAM-PIM emerges as a promising solution, offering significant advantages by directly…
We introduce a class of Sparse, Physics-based, and partially Interpretable Neural Networks (SPINN) for solving ordinary and partial differential equations (PDEs). By reinterpreting a traditional meshless representation of solutions of PDEs…
Sparse matrix-matrix multiplication (SpGEMM) is a critical kernel widely employed in machine learning and graph algorithms. However, real-world matrices' high sparsity makes SpGEMM memory-intensive. In-situ computing offers the potential to…
Sparse matrices, more specifically SpGEMM kernels, are commonly found in a wide range of applications, spanning graph-based path-finding to machine learning algorithms (e.g., neural networks). A particular challenge in implementing SpGEMM…
Sparse matrix-matrix multiplication (SpGEMM) is a widely used kernel in various graph, scientific computing and machine learning algorithms. It is well known that SpGEMM is a memory-bound operation, and its peak performance is expected to…
To address the challenge of increasing network size, researchers have developed sparse models through network pruning. However, maintaining model accuracy while achieving significant speedups on general computing devices remains an open…
We implement two novel algorithms for sparse-matrix dense-matrix multiplication (SpMM) on the GPU. Our algorithms expect the sparse input in the popular compressed-sparse-row (CSR) format and thus do not require expensive format conversion.…
The widespread integration of embedded systems across various industries has facilitated seamless connectivity among devices and bolstered computational capabilities. Despite their extensive applications, embedded systems encounter…
Long Short-Term Memory (LSTM) has achieved state-of-the-art performances on a wide range of tasks. Its outstanding performance is guaranteed by the long-term memory ability which matches the sequential data perfectly and the gating…