Related papers: Joint Hardware-Workload Co-Optimization for In-Mem…

Towards Efficient IMC Accelerator Design Through Joint Hardware-Workload Co-optimization

Designing generalized in-memory computing (IMC) hardware that efficiently supports a variety of workloads requires extensive design space exploration, which is infeasible to perform manually. Optimizing hardware individually for each…

Hardware Architecture · Computer Science 2025-02-04 Olga Krestinskaya , Mohammed E. Fouda , Ahmed Eltawil , Khaled N. Salama

Rethinking Co-design of Neural Architectures and Hardware Accelerators

Neural architectures and hardware accelerators have been two driving forces for the progress in deep learning. Previous works typically attempt to optimize hardware given a fixed model architecture or model architecture given fixed…

Machine Learning · Computer Science 2021-02-18 Yanqi Zhou , Xuanyi Dong , Berkin Akin , Mingxing Tan , Daiyi Peng , Tianjian Meng , Amir Yazdanbakhsh , Da Huang , Ravi Narayanaswami , James Laudon

Pack my weights and run! Minimizing overheads for in-memory computing accelerators

In-memory computing hardware accelerators allow more than 10x improvements in peak efficiency and performance for matrix-vector multiplications (MVM) compared to conventional digital designs. For this, they have gained great interest for…

Hardware Architecture · Computer Science 2024-09-19 Pouya Houshmand , Marian Verhelst

Multi-Objective Hardware-Mapping Co-Optimisation for Multi-DNN Workloads on Chiplet-based Accelerators

The need to efficiently execute different Deep Neural Networks (DNNs) on the same computing platform, coupled with the requirement for easy scalability, makes Multi-Chip Module (MCM)-based accelerators a preferred design choice. Such an…

Hardware Architecture · Computer Science 2024-08-26 Abhijit Das , Enrico Russo , Maurizio Palesi

MC$^2$A: Enabling Algorithm-Hardware Co-Design for Efficient Markov Chain Monte Carlo Acceleration

An increasing number of applications are exploiting sampling-based algorithms for planning, optimization, and inference. The Markov Chain Monte Carlo (MCMC) algorithms form the computational backbone of this emerging branch of machine…

Machine Learning · Computer Science 2025-07-18 Shirui Zhao , Jun Yin , Lingyun Yao , Martin Andraud , Wannes Meert , Marian Verhelst

CIM-MLC: A Multi-level Compilation Stack for Computing-In-Memory Accelerators

In recent years, various computing-in-memory (CIM) processors have been presented, showing superior performance over traditional architectures. To unleash the potential of various CIM architectures, such as device precision, crossbar size,…

Hardware Architecture · Computer Science 2024-05-09 Songyun Qu , Shixin Zhao , Bing Li , Yintao He , Xuyi Cai , Lei Zhang , Ying Wang

Learned Hardware/Software Co-Design of Neural Accelerators

The use of deep learning has grown at an exponential rate, giving rise to numerous specialized hardware and software systems for deep learning. Because the design space of deep learning software stacks and hardware accelerators is diverse…

Machine Learning · Computer Science 2020-10-06 Zhan Shi , Chirag Sakhuja , Milad Hashemi , Kevin Swersky , Calvin Lin

A Semi-Decoupled Approach to Fast and Optimal Hardware-Software Co-Design of Neural Accelerators

In view of the performance limitations of fully-decoupled designs for neural architectures and accelerators, hardware-software co-design has been emerging to fully reap the benefits of flexible design spaces and optimize neural network…

Hardware Architecture · Computer Science 2022-03-29 Bingqian Lu , Zheyu Yan , Yiyu Shi , Shaolei Ren

Hardware-Algorithm Co-Optimization of Early-Exit Neural Networks for Multi-Core Edge Accelerators

Deployment of dynamic neural networks on edge accelerators requires careful consideration of hardware constraints beyond conventional complexity metrics such as Multiply-Accumulate operations. In Early-Exiting Neural Networks (EENN), exit…

Computational Complexity · Computer Science 2026-04-01 Alaa Zniber , Arne Symons , Ouassim Karrakchou , Marian Verhelst , Mounir Ghogho

Software-Hardware Co-optimization for Modular E2E AV Paradigm: A Unified Framework of Optimization Approaches, Simulation Environment and Evaluation Metrics

Modular end-to-end (ME2E) autonomous driving paradigms combine modular interpretability with global optimization capability and have demonstrated strong performance. However, existing studies mainly focus on accuracy improvement, while…

Artificial Intelligence · Computer Science 2026-01-13 Chengzhi Ji , Xingfeng Li , Zhaodong Lv , Hao Sun , Pan Liu , Hao Frank Yang , Ziyuan Pu

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

Optimizing the quality of result (QoR) and the quality of service (QoS) of AI-empowered autonomous systems simultaneously is very challenging. First, there are multiple input sources, e.g., multi-modal data from different sensors, requiring…

Artificial Intelligence · Computer Science 2021-04-12 Cong Hao , Deming Chen

Co-Design of Memory-Storage Systems for Workload Awareness with Interpretable Models

Solid-state storage architectures based on NAND or emerging memory devices (SSD), are fundamentally architected and optimized for both reliability and performance. Achieving these simultaneous goals requires co-design of memory components…

Hardware Architecture · Computer Science 2026-03-20 Jay Sarkar , Vamsi Pavan Rayaprolu , Abhijeet Bhalerao

Hardware/Software co-design with ADC-Less In-memory Computing Hardware for Spiking Neural Networks

Spiking Neural Networks (SNNs) are bio-plausible models that hold great potential for realizing energy-efficient implementations of sequential tasks on resource-constrained edge devices. However, commercial edge platforms based on standard…

Neural and Evolutionary Computing · Computer Science 2023-09-26 Marco Paul E. Apolinario , Adarsh Kumar Kosta , Utkarsh Saxena , Kaushik Roy

SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Recent research efforts focus on reducing the computational and memory overheads of Large Language Models (LLMs) to make them feasible on resource-constrained devices. Despite advancements in compression techniques, non-linear operators…

Hardware Architecture · Computer Science 2024-11-28 Mariam Rakka , Jinhao Li , Guohao Dai , Ahmed Eltawil , Mohammed E. Fouda , Fadi Kurdahi

Hardware-Algorithm Co-Design for Hyperdimensional Computing Based on Memristive System-on-Chip

Hyperdimensional computing (HDC), utilizing a parallel computing paradigm and efficient learning algorithm, is well-suited for resource-constrained artificial intelligence (AI) applications, such as in edge devices. In-memory computing…

Emerging Technologies · Computer Science 2025-12-25 Yi Huang , Alireza Jaberi Rad , Qiangfei Xia

CIMNAS: A Joint Framework for Compute-In-Memory-Aware Neural Architecture Search

To maximize hardware efficiency and performance accuracy in Compute-In-Memory (CIM)-based neural network accelerators for Artificial Intelligence (AI) applications, co-optimizing both software and hardware design parameters is essential.…

Artificial Intelligence · Computer Science 2025-10-01 Olga Krestinskaya , Mohammed E. Fouda , Ahmed Eltawil , Khaled N. Salama

SCAR: Scheduling Multi-Model AI Workloads on Heterogeneous Multi-Chiplet Module Accelerators

Emerging multi-model workloads with heavy models like recent large language models significantly increased the compute and memory demands on hardware. To address such increasing demands, designing a scalable hardware architecture became a…

Hardware Architecture · Computer Science 2024-09-17 Mohanad Odema , Luke Chen , Hyoukjun Kwon , Mohammad Abdullah Al Faruque

Energy Efficient Software Hardware CoDesign for Machine Learning: From TinyML to Large Language Models

The rapid deployment of machine learning across platforms from milliwatt-class TinyML devices to large language models has made energy efficiency a primary constraint for sustainable AI. Across these scales, performance and energy are…

Hardware Architecture · Computer Science 2026-03-26 Mohammad Saleh Vahdatpour , Yanqing Zhang

An Algorithm-Hardware Co-design Framework to Overcome Imperfections of Mixed-signal DNN Accelerators

In recent years, processing in memory (PIM) based mixedsignal designs have been proposed as energy- and area-efficient solutions with ultra high throughput to accelerate DNN computations. However, PIM designs are sensitive to imperfections…

Hardware Architecture · Computer Science 2022-08-31 Payman Behnam , Uday Kamal , Saibal Mukhopadhyay

MCMComm: Hardware-Software Co-Optimization for End-to-End Communication in Multi-Chip-Modules

Increasing AI computing demands and slowing transistor scaling have led to the advent of Multi-Chip-Module (MCMs) based accelerators. MCMs enable cost-effective scalability, higher yield, and modular reuse by partitioning large chips into…

Hardware Architecture · Computer Science 2025-05-06 Ritik Raj , Shengjie Lin , William Won , Tushar Krishna