Related papers: Software-Hardware Codesign for Efficient In-Memory…

A neuromorphic hardware architecture using the Neural Engineering Framework for pattern recognition

We present a hardware architecture that uses the Neural Engineering Framework (NEF) to implement large-scale neural networks on Field Programmable Gate Arrays (FPGAs) for performing pattern recognition in real time. NEF is a framework that…

Neural and Evolutionary Computing · Computer Science 2015-07-22 Runchun Wang , Chetan Singh Thakur , Tara Julia Hamilton , Jonathan Tapson , Andre van Schaik

Work-in-Progress: Real-Time Neural Network Inference on a Custom RISC-V Multicore Vector Processor

Neural networks are increasingly used in real-time systems, such as automated driving applications. This requires high-performance hardware with predictable timing behavior. State-of-the-art real-time hardware is limited in memory and…

Hardware Architecture · Computer Science 2024-10-15 Maximilian Kirschner , Konstantin Dudzik , Jürgen Becker

Computational RAM to Accelerate String Matching at Scale

Traditional Von Neumann computing is falling apart in the era of exploding data volumes as the overhead of data transfer becomes forbidding. Instead, it is more energy-efficient to fuse compute capability with memory where the data reside.…

Hardware Architecture · Computer Science 2018-12-24 Zamshed I. Chowdhury , S. Karen Khatamifard , Zhengyang Zhao , Masoud Zabihi , Salonik Resch , Meisam Razaviyayn , Jian-Ping Wang , Sachin Sapatnekar , Ulya R. Karpuzcu

Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems

Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI…

Emerging Technologies · Computer Science 2025-07-03 Benjamin Chen Ming Choong , Tao Luo , Cheng Liu , Bingsheng He , Wei Zhang , Joey Tianyi Zhou

A Microprocessor implemented in 65nm CMOS with Configurable and Bit-scalable Accelerator for Programmable In-memory Computing

This paper presents a programmable in-memory-computing processor, demonstrated in a 65nm CMOS technology. For data-centric workloads, such as deep neural networks, data movement often dominates when implemented with today's computing…

Hardware Architecture · Computer Science 2020-09-17 Hongyang Jia , Yinqi Tang , Hossein Valavi , Jintao Zhang , Naveen Verma

Learning Machines Implemented on Non-Deterministic Hardware

This paper highlights new opportunities for designing large-scale machine learning systems as a consequence of blurring traditional boundaries that have allowed algorithm designers and application-level practitioners to stay -- for the most…

Machine Learning · Computer Science 2014-09-10 Suyog Gupta , Vikas Sindhwani , Kailash Gopalakrishnan

Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients

Kernel approximation is widely used to scale up kernel SVM training and prediction. However, the memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices such as…

Machine Learning · Computer Science 2020-10-07 Zijian Lei , Liang Lan

PaREM: A Novel Approach for Parallel Regular Expression Matching

Regular expression matching is essential for many applications, such as finding patterns in text, exploring substrings in large DNA sequences, or lexical analysis. However, sequential regular expression matching may be time-prohibitive for…

Formal Languages and Automata Theory · Computer Science 2015-06-30 Suejb Memeti , Sabri Pllana

Rethinking Co-design of Neural Architectures and Hardware Accelerators

Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed…

Machine Learning · Computer Science 2021-02-18 Yanqi Zhou , Xuanyi Dong , Berkin Akin , Mingxing Tan , Daiyi Peng , Tianjian Meng , Amir Yazdanbakhsh , Da Huang , Ravi Narayanaswami , James Laudon

Coding for Computation: Efficient Compression of Neural Networks for Reconfigurable Hardware

As state of the art neural networks (NNs) continue to grow in size, their resource-efficient implementation becomes ever more important. In this paper, we introduce a compression scheme that reduces the number of computations required for…

Machine Learning · Computer Science 2025-04-25 Hans Rosenberger , Rodrigo Fischer , Johanna S. Fröhlich , Ali Bereyhi , Ralf R. Müller

On Neural Architecture Search for Resource-Constrained Hardware Platforms

In the recent past, the success of Neural Architecture Search (NAS) has enabled researchers to broadly explore the design space using learning-based methods. Apart from finding better neural network architectures, the idea of automation has…

Machine Learning · Computer Science 2019-11-04 Qing Lu , Weiwen Jiang , Xiaowei Xu , Yiyu Shi , Jingtong Hu

How Hard is Weak-Memory Testing?

Weak-memory models are standard formal specifications of concurrency across hardware, programming languages, and distributed systems. A fundamental computational problem is consistency testing: is the observed execution of a concurrent…

Programming Languages · Computer Science 2023-11-16 Soham Chakraborty , Shankaranarayanan Krishna , Umang Mathur , Andreas Pavlogiannis

A Framework of Arithmetic-Level Variable Precision Computing for In-Memory Architecture: Case Study in MIMO Signal Processing

Computational complexity poses a significant challenge in wireless communication. Most existing attempts aim to reduce it through algorithm-specific approaches. However, the precision of computing, which directly relates to both computing…

Signal Processing · Electrical Eng. & Systems 2025-09-01 Kaixuan Bao , Wei Xu , Xiaohu You , Derrick Wing Kwan Ng

In-Storage Embedded Accelerator for Sparse Pattern Processing

We present a novel architecture for sparse pattern processing, using flash storage with embedded accelerators. Sparse pattern processing on large data sets is the essence of applications such as document search, natural language processing,…

Hardware Architecture · Computer Science 2017-01-25 Sang-Woo Jun , Huy T. Nguyen , Vijay N. Gadepally , Arvind

Regional Consistency: Programmability and Performance for Non-Cache-Coherent Systems

Parallel programmers face the often irreconcilable goals of programmability and performance. HPC systems use distributed memory for scalability, thereby sacrificing the programmability advantages of shared memory programming models.…

Distributed, Parallel, and Cluster Computing · Computer Science 2013-01-21 Bharath Ramesh , Calvin J. Ribbens , Srinidhi Varadarajan

A Case for Fine-grain Coherence Specialization in Heterogeneous Systems

Hardware specialization is becoming a key enabler of energyefficient performance. Future systems will be increasingly heterogeneous, integrating multiple specialized and programmable accelerators, each with different memory demands.…

Hardware Architecture · Computer Science 2021-04-26 Johnathan Alsop , Weon Taek Na , Matthew D. Sinclair , Samuel Grayson , Sarita V. Adve

Architectural Exploration of Application-Specific Resonant SRAM Compute-in-Memory (rCiM)

While general-purpose computing follows Von Neumann's architecture, the data movement between memory and processor elements dictates the processor's performance. The evolving compute-in-memory (CiM) paradigm tackles this issue by…

Hardware Architecture · Computer Science 2024-11-15 Dhandeep Challagundla , Ignatius Bezzam , Riadul Islam

Memory-Guided Unified Hardware Accelerator for Mixed-Precision Scientific Computing

Recent hardware acceleration advances have enabled powerful specialized accelerators for finite element computations, spiking neural network inference, and sparse tensor operations. However, existing approaches face fundamental limitations:…

Hardware Architecture · Computer Science 2026-01-09 Chuanzhen Wang , Leo Zhang , Eric Liu

Comprehensive Design Space Exploration for Tensorized Neural Network Hardware Accelerators

High-order tensor decomposition has been widely adopted to obtain compact deep neural networks for edge deployment. However, existing studies focus primarily on its algorithmic advantages such as accuracy and compression ratio-while…

Hardware Architecture · Computer Science 2025-11-26 Jinsong Zhang , Minghe Li , Jiayi Tian , Jinming Lu , Zheng Zhang

Computing on Functions Using Randomized Vector Representations

Vector space models for symbolic processing that encode symbols by random vectors have been proposed in cognitive science and connectionist communities under the names Vector Symbolic Architecture (VSA), and, synonymously, Hyperdimensional…

Machine Learning · Computer Science 2021-09-09 E. Paxon Frady , Denis Kleyko , Christopher J. Kymn , Bruno A. Olshausen , Friedrich T. Sommer