English
Related papers

Related papers: Compiling Neural Networks for a Computational Memo…

200 papers

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size,…

Hardware Architecture · Computer Science 2024-05-09 Songyun Qu , Shixin Zhao , Bing Li , Yintao He , Xuyi Cai , Lei Zhang , Ying Wang

Image processing and machine learning applications benefit tremendously from hardware acceleration, but existing compilers target either FPGAs, which sacrifice power and performance for flexible hardware, or ASICs, which rapidly become…

Computing-in-Memory (CIM) accelerators are a promising solution for accelerating Machine Learning (ML) workloads, as they perform Matrix-Vector Multiplications (MVMs) on crossbar arrays directly in memory. Although the bit widths of the…

Machine Learning · Computer Science 2026-03-20 Rebecca Pelke , Joel Klein , Jose Cubero-Cascante , Nils Bosbach , Jan Moritz Joseph , Rainer Leupers

Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the large storage overheads and the substantial computation cost of CNNs are problematic in hardware accelerators. Computing-in-memory (CIM)…

Hardware Architecture · Computer Science 2021-05-26 Syuan-Hao Sie , Jye-Luen Lee , Yi-Ren Chen , Chih-Cheng Lu , Chih-Cheng Hsieh , Meng-Fan Chang , Kea-Tiong Tang

This paper introduces the Modular Neural Computer (MNC), a memory-augmented neural architecture for exact algorithmic computation on variable-length inputs. The model combines an external associative memory of scalar cells, explicit read…

Machine Learning · Computer Science 2026-03-17 Florin Leon

Matrix multiplication is the dominant computation during Machine Learning (ML) inference. To efficiently perform such multiplication operations, Compute-in-memory (CiM) paradigms have emerged as a highly energy efficient solution. However,…

Hardware Architecture · Computer Science 2025-03-03 Tanvi Sharma , Mustafa Ali , Indranil Chakraborty , Kaushik Roy

Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI…

Emerging Technologies · Computer Science 2025-07-03 Benjamin Chen Ming Choong , Tao Luo , Cheng Liu , Bingsheng He , Wei Zhang , Joey Tianyi Zhou

Computing-in-memory (CIM) architectures demonstrate superior performance over traditional architectures. To unleash the potential of CIM accelerators, many compilation methods have been proposed, focusing on application scheduling…

Hardware Architecture · Computer Science 2025-02-25 Shixin Zhao , Yuming Li , Bing Li , Yintao He , Mengdi Wang , Yinhe Han , Ying Wang

Processing-in-memory (PIM) has shown extraordinary potential in accelerating neural networks. To evaluate the performance of PIM accelerators, we present an ISA-based simulation framework including a dedicated ISA targeting neural networks…

Hardware Architecture · Computer Science 2024-02-29 Xinyu Wang , Xiaotian Sun , Yinhe Han , Xiaoming Chen

The rise of data-intensive applications exposed the limitations of conventional processor-centric von-Neumann architectures that struggle to meet the off-chip memory bandwidth demand. Therefore, recent innovations in computer architecture…

Hardware Architecture · Computer Science 2024-05-28 Asif Ali Khan , Hamid Farzaneh , Karl F. A. Friebel , Clément Fournier , Lorenzo Chelini , Jeronimo Castrillon

The growing adoption of domain-specific architectures in edge computing platforms for deep learning has highlighted the efficiency of hardware accelerators. However, integrating custom accelerators into modern machine learning (ML)…

Machine Learning · Computer Science 2025-07-08 Samira Ahmadifarsani , Daniel Mueller-Gritschneder , Ulf Schlichtmann

The conventional approach of moving data to the CPU for computation has become a significant performance bottleneck for emerging scale-out data-intensive applications due to their limited data reuse. At the same time, the advancement in 3D…

Different from developing neural networks (NNs) for general-purpose processors, the development for NN chips usually faces with some hardware-specific restrictions, such as limited precision of network signals and parameters, constrained…

Neural and Evolutionary Computing · Computer Science 2018-01-19 Yu Ji , YouHui Zhang , WenGuang Chen , Yuan Xie

Crary and Sullivan's Relaxed Memory Calculus (RMC) proposed a new declarative approach for writing low-level shared memory concurrent programs in the presence of modern relaxed-memory multi-processor architectures and optimizing compilers.…

Programming Languages · Computer Science 2019-04-12 Michael J. Sullivan , Karl Crary , Salil Joshi

Machine learning and data analytics applications increasingly suffer from the high latency and energy consumption of conventional von Neumann architectures. Recently, several in-memory and near-memory systems have been proposed to remove…

Hardware Architecture · Computer Science 2023-09-13 Hamid Farzaneh , João Paulo Cardoso de Lima , Mengyuan Li , Asif Ali Khan , Xiaobo Sharon Hu , Jeronimo Castrillon

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…

Machine Learning · Computer Science 2022-08-29 Xiaofan Zhang , Yao Chen , Cong Hao , Sitao Huang , Yuhong Li , Deming Chen

Compute-Near-Memory (CNM) systems offer a promising approach to mitigate the von Neumann bottleneck by bringing computational units closer to data. However, optimizing for these architectures remains challenging due to their unique hardware…

Emerging Technologies · Computer Science 2025-08-18 Hamid Farzaneh , Asif Ali Khan , Jeronimo Castrillon

Classical machine learning (CML) occupies nearly half of machine learning pipelines in production applications. Unfortunately, it fails to utilize the state-of-the-practice devices fully and performs poorly. Without a unified framework, the…

Machine Learning · Computer Science 2023-05-01 Xu Wen , Wanling Gao , Anzheng Li , Lei Wang , Zihan Jiang , Jianfeng Zhan

Though CNNs are highly parallel workloads, in the absence of efficient on-chip memory reuse techniques, an accelerator for them quickly becomes memory bound. In this paper, we propose a CNN accelerator design for inference that is able to…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-26 Kingshuk Majumder , Shubham Nema , Uday Bondhugula

Content addressable memory (CAM) stands out as an efficient hardware solution for memory-intensive search operations by supporting parallel computation in memory. However, developing a CAM-based accelerator architecture that achieves…

Hardware Architecture · Computer Science 2024-03-11 Mengyuan Li , Shiyi Liu , Mohammad Mehdi Sharifi , X. Sharon Hu
‹ Prev 1 2 3 10 Next ›