English
Related papers

Related papers: Deep Recommender Models Inference: Automatic Asymm…

200 papers

Deep learning recommendation models (DLRM) rely on large embedding tables to manage categorical sparse features. Expanding such embedding tables can significantly enhance model performance, but at the cost of increased GPU/CPU/memory usage.…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-07-01 Qinlong Wang , Tingfeng Lan , Yinghao Tang , Ziling Huang , Yiheng Du , Haitao Zhang , Jian Sha , Hui Lu , Yuanchun Zhou , Ke Zhang , Mingjie Tang

Deep learning recommendation models have grown to the terabyte scale. Traditional serving schemes--that load entire models to a single server--are unable to support this scale. One approach to support this scale is with distributed serving,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-11-13 Michael Lui , Yavuz Yetim , Özgür Özkan , Zhuoran Zhao , Shin-Yeh Tsai , Carole-Jean Wu , Mark Hempstead

Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due…

Information Retrieval · Computer Science 2024-10-10 Sitian Chen , Haobin Tan , Amelie Chi Zhou , Yusen Li , Pavan Balaji

Personalized recommendation is a ubiquitous application on the internet, with many industries and hyperscalers extensively leveraging Deep Learning Recommendation Models (DLRMs) for their personalization needs (like ad serving or movie…

Hardware Architecture · Computer Science 2024-10-30 Rishabh Jain , Vivek M. Bhasi , Adwait Jog , Anand Sivasubramaniam , Mahmut T. Kandemir , Chita R. Das

Deep learning-based recommendation systems (e.g., DLRMs) are widely used AI models to provide high-quality personalized recommendations. Training data used for modern recommendation systems commonly includes categorical features taking on…

Information Retrieval · Computer Science 2026-01-06 Gopi Krishna Jha , Anthony Thomas , Nilesh Jain , Sameh Gobriel , Tajana Rosing , Ravi Iyer

The increasing complexity of deep learning recommendation models (DLRM) has led to a growing need for large-scale distributed systems that can efficiently train vast amounts of data. In DLRM, the sparse embedding table is a crucial…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-08-07 Xin Zhang , Quanyu Zhu , Liangbei Xu , Zain Huda , Wang Zhou , Jin Fang , Dennis van der Staay , Yuxi Hu , Jade Nie , Jiyan Yang , Chunzhi Yang

During the last two years, the goal of many researchers has been to squeeze the last bit of performance out of HPC system for AI tasks. Often this discussion is held in the context of how fast ResNet50 can be trained. Unfortunately,…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-12 Dhiraj Kalamkar , Evangelos Georganas , Sudarshan Srinivasan , Jianping Chen , Mikhail Shiryaev , Alexander Heinecke

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage…

Information Retrieval · Computer Science 2022-08-11 Jiarui Fang , Geng Zhang , Jiatong Han , Shenggui Li , Zhengda Bian , Yongbin Li , Jin Liu , Yang You

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The…

Distributed systems can be found in various applications, e.g., in robotics or autonomous driving, to achieve higher flexibility and robustness. Thereby, data flow centric applications such as Deep Neural Network (DNN) inference benefit…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-10-14 Fabian Kreß , El Mahdi El Annabi , Tim Hotfilter , Julian Hoefer , Tanja Harbaum , Juergen Becker

Personalized recommendation systems leverage deep learning models and account for the majority of data center AI cycles. Their performance is dominated by memory-bound sparse embedding operations with unique irregular memory access patterns…

The deployment of large-scale models, such as large language models (LLMs), incurs substantial costs due to their computational demands. To mitigate these costs and address challenges related to scalability and data security, there is a…

With the rapid growth of Internet services, recommendation systems play a central role in delivering personalized content. Faced with massive user requests and complex model architectures, the key challenge for real-time recommendation…

Information Retrieval · Computer Science 2025-08-14 Junli Shao , Jing Dong , Dingzhou Wang , Kowei Shih , Dannier Li , Chengrui Zhou

Fully Homomorphic Encryption (FHE) allows for computation directly on encrypted data and enables privacy-preserving neural inference in the cloud. Prior work has focused on models with dense inputs (e.g., CNNs), with less attention given to…

Cryptography and Security · Computer Science 2026-02-23 Karthik Garimella , Austin Ebel , Gabrielle De Micheli , Brandon Reagen

Deep Learning (DL) has developed to become a corner-stone in many everyday applications that we are now relying on. However, making sure that the DL model uses the underlying hardware efficiently takes a lot of effort. Knowledge about…

Performance · Computer Science 2023-03-22 Karthick Panner Selvam , Mats Brorsson

Deep learning (DL) has been widely adopted those last years but they are computing-intensive method. Therefore, scientists proposed diverse optimization to accelerate their predictions for end-user applications. However, no single inference…

Machine Learning · Computer Science 2022-10-11 Pierrick Pochelu

The deployment of large-scale models, such as large language models (LLMs) and sophisticated image generation systems, incurs substantial costs due to their computational demands. To mitigate these costs and address challenges related to…

Machine Learning · Computer Science 2024-10-30 Yuzhe Yang , Yipeng Du , Ahmad Farhan , Claudio Angione , Yue Zhao , Harry Yang , Fielding Johnston , James Buban , Patrick Colangelo

Deep learning for recommendation data is one of the most pervasive and challenging AI workload in recent times. State-of-the-art recommendation models are one of the largest models matching the likes of GPT-3 and Switch Transformer.…

Information Retrieval · Computer Science 2022-01-25 Aditya Desai , Li Chou , Anshumali Shrivastava

DL inference queries play an important role in diverse internet services and a large fraction of datacenter cycles are spent on processing DL inference queries. Specifically, the matrix-matrix multiplication (GEMM) operations of…

Hardware Architecture · Computer Science 2020-12-02 Benjamin Y. Cho , Jeageun Jung , Mattan Erez

Deep Learning Recommendation Models (DLRM) are widespread, account for a considerable data center footprint, and grow by more than 1.5x per year. With model size soon to be in terabytes range, leveraging Storage ClassMemory (SCM) for…

‹ Prev 1 2 3 10 Next ›