Related papers: In-memory Multi-valued Associative Processor
In-memory computing is an emerging computing paradigm that overcomes the limitations of exiting Von-Neumann computing architectures such as the memory-wall bottleneck. In such paradigm, the computations are performed directly on the data…
Sparse matrix multiplication is an important component of linear algebra computations. Implementing sparse matrix multiplication on an associative processor (AP) enables high level of parallelism, where a row of one matrix is multiplied in…
Ternary weight quantization (e.g., BitNet b1.58) offers a promising path to mitigate the memory bandwidth bottleneck in Large Language Model (LLM) inference. However, conventional compute platforms lack native support for ternary-weight…
The emerging memristive Memory Processing Unit (mMPU) overcomes the memory wall through memristive devices that unite storage and logic for real processing-in-memory (PIM) systems. At the core of the mMPU is stateful logic, which is…
Traditional von Neumann architecture based processors become inefficient in terms of energy and throughput as they involve separate processing and memory units, also known as~\textit{memory wall}. The memory wall problem is further…
The design of systems implementing low precision neural networks with emerging memories such as resistive random access memory (RRAM) is a major lead for reducing the energy consumption of artificial intelligence (AI). Multiple works have…
Processing-in-Memory (PIM) enhances memory with computational capabilities, potentially solving energy and latency issues associated with data transfer between memory and processors. However, managing concurrent computation and data flow…
The construction of Mapper has emerged in the last decade as a powerful and effective topological data analysis tool that approximates and generalizes other topological summaries, such as the Reeb graph, the contour tree, split, and joint…
In-memory computing hardware accelerators allow more than 10x improvements in peak efficiency and performance for matrix-vector multiplications (MVM) compared to conventional digital designs. For this, they have gained great interest for…
Large language model (LLM) inference has been a prevalent demand in daily life and industries. The large tensor sizes and computing complexities in LLMs have brought challenges to memory, computing, and databus. This paper proposes a…
Triangles are the basic substructure of networks and triangle counting (TC) has been a fundamental graph computing problem in numerous fields such as social network analysis. Nevertheless, like other graph computing problems, due to the…
This paper presents a novel approach to enhance the Binary-Addition-Tree algorithm (BAT) by integrating incremental learning techniques. BAT, known for its simplicity in development, implementation, and application, is a powerful implicit…
Processing in memory (PIM) moves computation into memories with the goal of improving throughput and energy-efficiency compared to traditional von Neumann-based architectures. Most existing PIM architectures are either general-purpose but…
Digital processing-in-memory (PIM) architectures are rapidly emerging to overcome the memory-wall bottleneck by integrating logic within memory elements. Such architectures provide vast computational power within the memory itself in the…
The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von Neumann systems. Several…
Triangle counting (TC) is a fundamental problem in graph analysis and has found numerous applications, which motivates many TC acceleration solutions in the traditional computing platforms like GPU and FPGA. However, these approaches suffer…
Neuromorphic architectures, which incorporate parallel and in-memory processing, are crucial for accelerating artificial neural network (ANN) computations. This work presents a novel memristor-based multi-layer neural network (memristive…
Processing-in-memory (PIM) has emerged as the go to solution for addressing the von Neumann bottleneck in edge AI accelerators. However, state-of-the-art (SoTA) digital PIM approaches suffer from low compute density, primarily due to the…
In this paper we propose a new method to enhance a mapping $\mu(\cdot)$ of a parallel application's computational tasks to the processing elements (PEs) of a parallel computer. The idea behind our method \mswap is to enhance such a mapping…
Many parallel algorithms use at least linear auxiliary space in the size of the input to enable computations to be done independently without conflicts. Unfortunately, this extra space can be prohibitive for memory-limited machines,…