English
Related papers

Related papers: Accelerating Multi-Scale Deformable Attention Usin…

200 papers

Multi-scale deformable attention (MSDA) is a flexible and powerful feature extraction mechanism for visual tasks, but its random-access grid sampling strategy poses significant optimization challenges, especially on domain-specific…

Performance · Computer Science 2025-05-21 Chenghuan Huang , Zhigeng Xu , Chong Sun , Chen Li , Ziyang Ma

Multi-scale deformable attention (MSDeformAttn) has emerged as a key mechanism in various vision tasks, demonstrating explicit superiority attributed to multi-scale grid-sampling. However, this newly introduced operator incurs irregular…

Hardware Architecture · Computer Science 2024-03-19 Yansong Xu , Dongxu Lyu , Zhenyu Li , Zilong Wang , Yuzhou Chen , Gang Wang , Zhican Wang , Haomin Li , Guanghui He

Traditional computers with von Neumann architecture are unable to meet the latency and scalability challenges of Deep Neural Network (DNN) workloads. Various DNN accelerators based on Conventional compute Hardware Accelerator (CHA),…

Hardware Architecture · Computer Science 2022-08-11 Tom Glint , Chandan Kumar Jha , Manu Awasthi , Joycee Mekie

Near-data processing (NDP) refers to augmenting memory or storage with processing power. Despite its potential for acceleration computing and reducing power requirements, only limited progress has been made in popularizing NDP for various…

Hardware Architecture · Computer Science 2017-05-01 Hyeokjun Choe , Seil Lee , Hyunha Nam , Seongsik Park , Seijoon Kim , Eui-Young Chung , Sungroh Yoon

In-memory computing (IMC) with non-volatile memories (NVMs) has emerged as a promising approach to address the rapidly growing computational demands of Deep Neural Networks (DNNs). Mapping DNN layers spatially onto NVM-based IMC…

Hardware Architecture · Computer Science 2023-12-07 Abinand Nallathambi , Christin David Bose , Wilfried Haensch , Anand Raghunathan

Neural Processes (NPs) are popular meta-learning methods for efficiently modelling predictive uncertainty. Recent state-of-the-art methods, however, leverage expensive attention mechanisms, limiting their applications, particularly in…

Machine Learning · Computer Science 2024-05-28 Leo Feng , Frederick Tung , Hossein Hajimirsadeghi , Yoshua Bengio , Mohamed Osama Ahmed

Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key…

Hardware Architecture · Computer Science 2021-12-24 Mehdi Hassanpour , Marc Riera , Antonio González

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns…

The attention mechanism has become a go-to technique for natural language processing and computer vision tasks. Recently, the MLP-Mixer and other MLP-based architectures, based simply on multi-layer perceptrons (MLPs), are also powerful…

Computer Vision and Pattern Recognition · Computer Science 2022-05-31 Tian Lv , Chongyang Bai , Chaojie Wang

A key scalability challenge in neural solvers for industrial-scale physics simulations is efficiently capturing both fine-grained local interactions and long-range global dependencies across millions of spatial elements. We introduce the…

Machine Learning · Computer Science 2026-03-10 Pedro M. P. Curvo , Jan-Willem van de Meent , Maksim Zhdanov

Deformable Attention Transformers (DAT) have shown remarkable performance in computer vision tasks by adaptively focusing on informative image regions. However, their data-dependent sampling mechanism introduces irregular memory access…

Computer Vision and Pattern Recognition · Computer Science 2025-07-29 Wendong Mao , Mingfan Zhao , Jianfeng Guan , Qiwei Dong , Zhongfeng Wang

Deep Neural Networks (DNNs) have shown excellent performance in a wide range of machine learning applications. Knowing the latency of running a DNN model or tensor program on a specific device is useful in various tasks, such as DNN graph-…

Machine Learning · Computer Science 2023-11-20 Hanpeng Hu , Junwei Su , Juntao Zhao , Yanghua Peng , Yibo Zhu , Haibin Lin , Chuan Wu

As large language models (LLMs) continue to advance, retrieval-augmented generation (RAG) has become the key mechanism for expanding model knowledge and reducing hallucinations. Central to RAG is approximate nearest neighbor search (ANNS),…

Hardware Architecture · Computer Science 2026-05-22 Cheng Zou , Shuo Yang , Chen Nie , Yu Zou , Yu He , Chao Jiang , Limin Xiao , Weifeng Zhang , Zhezhi He

Efficient road and building footprint extraction from satellite images are predominant in many remote sensing applications. However, precise segmentation map extraction is quite challenging due to the diverse building structures camouflaged…

Computer Vision and Pattern Recognition · Computer Science 2024-01-01 Shirsha Bose , Ritesh Sur Chowdhury , Debabrata Pal , Shivashish Bose , Biplab Banerjee , Subhasis Chaudhuri

The constant growth of DNNs makes them challenging to implement and run efficiently on traditional compute-centric architectures. Some accelerators have attempted to add more compute units and on-chip buffers to solve the memory wall…

Hardware Architecture · Computer Science 2023-10-30 Bahareh Khabbazan , Marc Riera , Antonio González

De novo assembly enables investigations of unknown genomes, paving the way for personalized medicine and disease management. However, it faces immense computational challenges arising from the excessive data volumes and algorithmic…

Hardware Architecture · Computer Science 2025-05-14 Heewoo Kim , Sanjay Sri Vallabh Singapuram , Haojie Ye , Joseph Izraelevitz , Trevor Mudge , Ronald Dreslinski , Nishil Talati

Determinantal point processes (DPPs) have attracted significant attention in machine learning for their ability to model subsets drawn from a large item collection. Recent work shows that nonsymmetric DPP (NDPP) kernels have significant…

Machine Learning · Computer Science 2021-04-14 Mike Gartrell , Insu Han , Elvis Dohmatob , Jennifer Gillenwater , Victor-Emmanuel Brunel

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D…

Hardware Architecture · Computer Science 2021-04-29 Pritam Majumder , Jiayi Huang , Sungkeun Kim , Abdullah Muzahid , Dylan Siegers , Chia-Che Tsai , Eun Jung Kim

A multiply-accumulate (MAC) operation is the main computation unit for DSP applications. DSP blocks are one of the efficient solutions to implement MACs in FPGA's. However, since the DSP blocks have wide multiplier and adder blocks, MAC…

Hardware Architecture · Computer Science 2021-10-26 Ercan Kalali , Rene van Leuken

Approximate nearest neighbor search (ANNS) is a key retrieval technique for vector database and many data center applications, such as person re-identification and recommendation systems. It is also fundamental to retrieval augmented…

Hardware Architecture · Computer Science 2024-05-30 Yitu Wang , Shiyu Li , Qilin Zheng , Linghao Song , Zongwang Li , Andrew Chang , Hai "Helen" Li , Yiran Chen
‹ Prev 1 2 3 10 Next ›