Related papers: Accelerating Multi-Scale Deformable Attention Usin…

Towards Efficient Multi-Scale Deformable Attention on NPU

Multi-scale deformable attention (MSDA) is a flexible and powerful feature extraction mechanism for visual tasks, but its random-access grid sampling strategy poses significant optimization challenges, especially on domain-specific…

Performance · Computer Science 2025-05-21 Chenghuan Huang , Zhigeng Xu , Chong Sun , Chen Li , Ziyang Ma

DEFA: Efficient Deformable Attention Acceleration via Pruning-Assisted Grid-Sampling and Multi-Scale Parallel Processing

Multi-scale deformable attention (MSDeformAttn) has emerged as a key mechanism in various vision tasks, demonstrating explicit superiority attributed to multi-scale grid-sampling. However, this newly introduced operator incurs irregular…

Hardware Architecture · Computer Science 2024-03-19 Yansong Xu , Dongxu Lyu , Zhenyu Li , Zilong Wang , Yuzhou Chen , Gang Wang , Zhican Wang , Haomin Li , Guanghui He

A Fresh Perspective on DNN Accelerators by Performing Holistic Analysis Across Paradigms

Traditional computers with von Neumann architecture are unable to meet the latency and scalability challenges of Deep Neural Network (DNN) workloads. Various DNN accelerators based on Conventional compute Hardware Accelerator (CHA),…

Hardware Architecture · Computer Science 2022-08-11 Tom Glint , Chandan Kumar Jha , Manu Awasthi , Joycee Mekie

Near-Data Processing for Differentiable Machine Learning Models

Near-data processing (NDP) refers to augmenting memory or storage with processing power. Despite its potential for acceleration computing and reducing power requirements, only limited progress has been made in popularizing NDP for various…

Hardware Architecture · Computer Science 2017-05-01 Hyeokjun Choe , Seil Lee , Hyunha Nam , Seongsik Park , Seijoon Kim , Eui-Young Chung , Sungroh Yoon

LRMP: Layer Replication with Mixed Precision for Spatial In-memory DNN Accelerators

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC…

Hardware Architecture · Computer Science 2023-12-07 Abinand Nallathambi , Christin David Bose , Wilfried Haensch , Anand Raghunathan

Memory Efficient Neural Processes via Constant Memory Attention Block

Neural Processes (NPs) are popular meta-learning methods for efficiently modelling predictive uncertainty. Recent state-of-the-art methods, however, leverage expensive attention mechanisms, limiting their applications, particularly in…

Machine Learning · Computer Science 2024-05-28 Leo Feng , Frederick Tung , Hossein Hajimirsadeghi , Yoshua Bengio , Mohamed Osama Ahmed

A Survey of Near-Data Processing Architectures for Neural Networks

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-01-01 Liu Ke , Udit Gupta , Carole-Jean Wu , Benjamin Youngjae Cho , Mark Hempstead , Brandon Reagen , Xuan Zhang , David Brooks , Vikas Chandra , Utku Diril , Amin Firoozshahian , Kim Hazelwood , Bill Jia , Hsien-Hsin S. Lee , Meng Li , Bert Maher , Dheevatsa Mudigere , Maxim Naumov , Martin Schatz , Mikhail Smelyanskiy , Xiaodong Wang

MDMLP: Image Classification from Scratch on Small Datasets with MLP

The attention mechanism has become a go-to technique for natural language processing and computer vision tasks. Recently, the MLP-Mixer and other MLP-based architectures, based simply on multi-layer perceptrons (MLPs), are also powerful…

Computer Vision and Pattern Recognition · Computer Science 2022-05-31 Tian Lv , Chongyang Bai , Chaojie Wang

MSPT: Efficient Large-Scale Physical Modeling via Parallelized Multi-Scale Attention

A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. We introduce the…

Machine Learning · Computer Science 2026-03-10 Pedro M. P. Curvo , Jan-Willem van de Meent , Maksim Zhdanov

A Memory-Efficient Framework for Deformable Transformer with Neural Architecture Search

Deformable Attention Transformers (DAT) have shown remarkable performance in computer vision tasks by adaptively focusing on informative image regions. However, their data-dependent sampling mechanism introduces irregular memory access…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wendong Mao , Mingfan Zhao , Jianfeng Guan , Qiwei Dong , Zhongfeng Wang

CDMPP: A Device-Model Agnostic Framework for Latency Prediction of Tensor Programs

Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications. Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks, such as DNN graph-…

Machine Learning · Computer Science 2023-11-20 Hanpeng Hu , Junwei Su , Juntao Zhao , Yanghua Peng , Yibo Zhu , Haibin Lin , Chuan Wu

NasZip: Software and Hardware Co-Design to Accelerate Approximate Nearest Neighbor Search with DIMM-Based Near-Data Processing

As large language models (LLMs) continue to advance, retrieval-augmented generation (RAG) has become the key mechanism for expanding model knowledge and reducing hallucinations. Central to RAG is approximate nearest neighbor search (ANNS),…

Hardware Architecture · Computer Science 2026-05-22 Cheng Zou , Shuo Yang , Chen Nie , Yu Zou , Yu He , Chao Jiang , Limin Xiao , Weifeng Zhang , Zhezhi He

MultiScale Probability Map guided Index Pooling with Attention-based learning for Road and Building Segmentation

Efficient road and building footprint extraction from satellite images are predominant in many remote sensing applications. However, precise segmentation map extraction is quite challenging due to the diverse building structures camouflaged…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Shirsha Bose , Ritesh Sur Chowdhury , Debabrata Pal , Shivashish Bose , Biplab Banerjee , Subhasis Chaudhuri

An Energy-Efficient Near-Data Processing Accelerator for DNNs that Optimizes Data Accesses

The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall…

Hardware Architecture · Computer Science 2023-10-30 Bahareh Khabbazan , Marc Riera , Antonio González

NMP-PaK: Near-Memory Processing Acceleration of Scalable De Novo Genome Assembly

De novo assembly enables investigations of unknown genomes, paving the way for personalized medicine and disease management. However, it faces immense computational challenges arising from the excessive data volumes and algorithmic…

Hardware Architecture · Computer Science 2025-05-14 Heewoo Kim , Sanjay Sri Vallabh Singapuram , Haojie Ye , Joseph Izraelevitz , Trevor Mudge , Ronald Dreslinski , Nishil Talati

Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes

Determinantal point processes (DPPs) have attracted significant attention in machine learning for their ability to model subsets drawn from a large item collection. Recent work shows that nonsymmetric DPP (NDPP) kernels have significant…

Machine Learning · Computer Science 2021-04-14 Mike Gartrell , Insu Han , Elvis Dohmatob , Jennifer Gillenwater , Victor-Emmanuel Brunel

Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D…

Hardware Architecture · Computer Science 2021-04-29 Pritam Majumder , Jiayi Huang , Sungkeun Kim , Abdullah Muzahid , Dylan Siegers , Chia-Che Tsai , Eun Jung Kim

Near-Precise Parameter Approximation for Multiple Multiplications on A Single DSP Block

A multiply-accumulate (MAC) operation is the main computation unit for DSP applications. DSP blocks are one of the efficient solutions to implement MACs in FPGA's. However, since the DSP blocks have wide multiplier and adder blocks, MAC…

Hardware Architecture · Computer Science 2021-10-26 Ercan Kalali , Rene van Leuken

NDSEARCH: Accelerating Graph-Traversal-Based Approximate Nearest Neighbor Search through Near Data Processing

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented…

Hardware Architecture · Computer Science 2024-05-30 Yitu Wang , Shiyu Li , Qilin Zheng , Linghao Song , Zongwang Li , Andrew Chang , Hai "Helen" Li , Yiran Chen