Related papers: Mem-Rec: Memory Efficient Recommendation System us…

MP-Rec: Hardware-Software Co-Design to Enable Multi-Path Recommendation

Deep learning recommendation systems serve personalized content under diverse tail-latency targets and input-query loads. In order to do so, state-of-the-art recommendation models rely on terabyte-scale embedding tables to learn user…

Hardware Architecture · Computer Science 2023-02-22 Samuel Hsia , Udit Gupta , Bilge Acun , Newsha Ardalani , Pan Zhong , Gu-Yeon Wei , David Brooks , Carole-Jean Wu

EncodeRec: An Embedding Backbone for Recommendation Systems

Recent recommender systems increasingly leverage embeddings from large pre-trained language models (PLMs). However, such embeddings exhibit two key limitations: (1) PLMs are not explicitly optimized to produce structured and discriminative…

Computation and Language · Computer Science 2026-01-19 Guy Hadad , Neomi Rabaev , Bracha Shapira

MicroRec: Efficient Recommendation Inference by Hardware and Data Structure Solutions

Deep neural networks are widely used in personalized recommendation systems. Unlike regular DNN inference workloads, recommendation inference is memory-bound due to the many random memory accesses needed to lookup the embedding tables. The…

Hardware Architecture · Computer Science 2021-02-22 Wenqi Jiang , Zhenhao He , Shuai Zhang , Thomas B. Preußer , Kai Zeng , Liang Feng , Jiansong Zhang , Tongxuan Liu , Yong Li , Jingren Zhou , Ce Zhang , Gustavo Alonso

Random Offset Block Embedding Array (ROBE) for CriteoTB Benchmark MLPerf DLRM Model : 1000$\times$ Compression and 3.1$\times$ Faster Inference

Deep learning for recommendation data is one of the most pervasive and challenging AI workload in recent times. State-of-the-art recommendation models are one of the largest models matching the likes of GPT-3 and Switch Transformer.…

Information Retrieval · Computer Science 2022-01-25 Aditya Desai , Li Chou , Anshumali Shrivastava

Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory

Deep learning recommendation models (DLRMs) are widely used in industry, and their memory capacity requirements reach the terabyte scale. Tiered memory architectures provide a cost-effective solution but introduce challenges in…

Performance · Computer Science 2025-11-12 Jie Ren , Bin Ma , Shuangyan Yang , Benjamin Francis , Ehsan K. Ardestani , Min Si , Dong Li

TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models

The memory capacity of embedding tables in deep learning recommendation models (DLRMs) is increasing dramatically from tens of GBs to TBs across the industry. Given the fast growth in DLRMs, novel solutions are urgently needed, in order to…

Machine Learning · Computer Science 2021-01-29 Chunxing Yin , Bilge Acun , Xing Liu , Carole-Jean Wu

A Universal Framework for Compressing Embeddings in CTR Prediction

Accurate click-through rate (CTR) prediction is vital for online advertising and recommendation systems. Recent deep learning advancements have improved the ability to capture feature interactions and understand user interests. However,…

Information Retrieval · Computer Science 2025-02-24 Kefan Wang , Hao Wang , Kenan Song , Wei Guo , Kai Cheng , Zhi Li , Yong Liu , Defu Lian , Enhong Chen

FELRec: Efficient Handling of Item Cold-Start With Dynamic Representation Learning in Recommender Systems

Recommender systems suffer from the cold-start problem whenever a new user joins the platform or a new item is added to the catalog. To address item cold-start, we propose to replace the embedding layer in sequential recommenders with a…

Information Retrieval · Computer Science 2024-10-02 Kuba Weimann , Tim O. F. Conrad

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the…

Machine Learning · Computer Science 2020-06-30 Hao-Jun Michael Shi , Dheevatsa Mudigere , Maxim Naumov , Jiyan Yang

SCRec: A Scalable Computational Storage System with Statistical Sharding and Tensor-train Decomposition for Recommendation Models

Deep Learning Recommendation Models (DLRMs) play a crucial role in delivering personalized content across web applications such as social networking and video streaming. However, with improvements in performance, the parameter size of DLRMs…

Hardware Architecture · Computer Science 2025-04-02 Jinho Yang , Ji-Hoon Kim , Joo-Young Kim

RA-Rec: An Efficient ID Representation Alignment Framework for LLM-based Recommendation

Large language models (LLM) have recently emerged as a powerful tool for a variety of natural language processing tasks, bringing a new surge of combining LLM with recommendation systems, termed as LLM-based RS. Current approaches generally…

Information Retrieval · Computer Science 2024-03-20 Xiaohan Yu , Li Zhang , Xin Zhao , Yue Wang , Zhongrui Ma

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage…

Information Retrieval · Computer Science 2022-08-11 Jiarui Fang , Geng Zhang , Jiatong Han , Shenggui Li , Zhengda Bian , Yongbin Li , Jin Liu , Yang You

Deep Recommender Models Inference: Automatic Asymmetric Data Flow Optimization

Deep Recommender Models (DLRMs) inference is a fundamental AI workload accounting for more than 79% of the total AI workload in Meta's data centers. DLRMs' performance bottleneck is found in the embedding layers, which perform many random…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-07-03 Giuseppe Ruggeri , Renzo Andri , Daniele Jahier Pagliari , Lukas Cavigelli

Mixed-Precision Embedding Using a Cache

In recommendation systems, practitioners observed that increase in the number of embedding tables and their sizes often leads to significant improvement in model performances. Given this and the business importance of these models to major…

Machine Learning · Computer Science 2020-10-26 Jie Amy Yang , Jianyu Huang , Jongsoo Park , Ping Tak Peter Tang , Andrew Tulloch

Learning Compressed Embeddings for On-Device Inference

In deep learning, embeddings are widely used to represent categorical entities such as words, apps, and movies. An embedding layer maps each entity to a unique vector, causing the layer's memory requirement to be proportional to the number…

Machine Learning · Computer Science 2022-03-22 Niketan Pansare , Jay Katukuri , Aditya Arora , Frank Cipollone , Riyaaz Shaik , Noyan Tokgozoglu , Chandru Venkataraman

MemRec: Collaborative Memory-Augmented Agentic Recommender System

The evolution of recommender systems has shifted from traditional collaborative filtering to LLM-based agentic systems, which rely on semantic user and item memories to make predictions. However, existing agents maintain these memories in…

Information Retrieval · Computer Science 2026-04-29 Weixin Chen , Yuhan Zhao , Jingyuan Huang , Zihe Ye , Clark Mingxuan Ju , Tong Zhao , Neil Shah , Li Chen , Yongfeng Zhang

LLM2Rec: Large Language Models Are Powerful Embedding Models for Sequential Recommendation

Sequential recommendation aims to predict users' future interactions by modeling collaborative filtering (CF) signals from historical behaviors of similar users or items. Traditional sequential recommenders predominantly rely on ID-based…

Information Retrieval · Computer Science 2025-06-30 Yingzhi He , Xiaohao Liu , An Zhang , Yunshan Ma , Tat-Seng Chua

RecShard: Statistical Feature-Based Memory Optimization for Industry-Scale Neural Recommendation

We propose RecShard, a fine-grained embedding table (EMB) partitioning and placement technique for deep learning recommendation models (DLRMs). RecShard is designed based on two key observations. First, not all EMBs are equal, nor all rows…

Machine Learning · Computer Science 2022-01-26 Geet Sethi , Bilge Acun , Niket Agarwal , Christos Kozyrakis , Caroline Trippel , Carole-Jean Wu

Efficient Large-Scale Cross-Domain Sequential Recommendation with Dynamic State Representations

Recently, autoregressive recommendation models (ARMs), such as Meta's HSTU model, have emerged as a major breakthrough over traditional Deep Learning Recommendation Models (DLRMs), exhibiting the highly sought-after scaling law behaviour.…

Information Retrieval · Computer Science 2025-08-29 Manuel V. Loureiro , Steven Derby , Aleksei Medvedev , Alejandro Ariza-Casabona , Gonzalo Fiz Pontiveros , Tri Kurniawan Wijaya

Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer

Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed…

Machine Learning · Computer Science 2021-08-27 Bencheng Yan , Pengjie Wang , Kai Zhang , Wei Lin , Kuang-Chih Lee , Jian Xu , Bo Zheng