Related papers: Binary Code based Hash Embedding for Web-scale App…

Compositional Embeddings Using Complementary Partitions for Memory-Efficient Recommendation Systems

Modern deep learning-based recommendation systems exploit hundreds to thousands of different categorical features, each with millions of different categories ranging from clicks to posts. To respect the natural diversity within the…

Machine Learning · Computer Science 2020-06-30 Hao-Jun Michael Shi , Dheevatsa Mudigere , Maxim Naumov , Jiyan Yang

Learning to Embed Categorical Features without Embedding Tables for Recommendation

Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each…

Machine Learning · Computer Science 2021-06-08 Wang-Cheng Kang , Derek Zhiyuan Cheng , Tiansheng Yao , Xinyang Yi , Ting Chen , Lichan Hong , Ed H. Chi

Search Efficient Binary Network Embedding

Traditional network embedding primarily focuses on learning a continuous vector representation for each node, preserving network structure and/or node content information, such that off-the-shelf machine learning algorithms can be easily…

Social and Information Networks · Computer Science 2023-01-02 Daokun Zhang , Jie Yin , Xingquan Zhu , Chengqi Zhang

Deep Feature Embedding for Tabular Data

Tabular data learning has extensive applications in deep learning but its existing embedding techniques are limited in numerical and categorical features such as the inability to capture complex relationships and engineering. This paper…

Machine Learning · Computer Science 2024-09-02 Yuqian Wu , Hengyi Luo , Raymond S. T. Lee

Efficient end-to-end learning for quantizable representations

Embedding representation learning via neural networks is at the core foundation of modern similarity based search. While much effort has been put in developing algorithms for learning binary hamming code representations for search…

Machine Learning · Computer Science 2018-06-13 Yeonwoo Jeong , Hyun Oh Song

Binary Embedding-based Retrieval at Tencent

Large-scale embedding-based retrieval (EBR) is the cornerstone of search-related industrial applications. Given a user query, the system of EBR aims to identify relevant information from a large corpus of documents that may be tens or…

Information Retrieval · Computer Science 2023-02-20 Yukang Gan , Yixiao Ge , Chang Zhou , Shupeng Su , Zhouchuan Xu , Xuyuan Xu , Quanchao Hui , Xiang Chen , Yexin Wang , Ying Shan

A Frequency-aware Software Cache for Large Recommendation System Embeddings

Deep learning recommendation models (DLRMs) have been widely applied in Internet companies. The embedding tables of DLRMs are too large to fit on GPU memory entirely. We propose a GPU-based software cache approaches to dynamically manage…

Information Retrieval · Computer Science 2022-08-11 Jiarui Fang , Geng Zhang , Jiatong Han , Shenggui Li , Zhengda Bian , Yongbin Li , Jin Liu , Yang You

Learning to Collide: Recommendation System Model Compression with Learned Hash Functions

A key characteristic of deep recommendation models is the immense memory requirements of their embedding tables. These embedding tables can often reach hundreds of gigabytes which increases hardware requirements and training cost. A common…

Information Retrieval · Computer Science 2022-03-31 Benjamin Ghaemmaghami , Mustafa Ozdal , Rakesh Komuravelli , Dmitriy Korchev , Dheevatsa Mudigere , Krishnakumar Nair , Maxim Naumov

Discrete Hashing with Deep Neural Network

This paper addresses the problem of learning binary hash codes for large scale image search by proposing a novel hashing method based on deep neural network. The advantage of our deep model over previous deep model used in hashing is that…

Computer Vision and Pattern Recognition · Computer Science 2015-08-31 Thanh-Toan Do , Anh-Zung Doan , Ngai-Man Cheung

Mixed-Precision Embedding Using a Cache

In recommendation systems, practitioners observed that increase in the number of embedding tables and their sizes often leads to significant improvement in model performances. Given this and the business importance of these models to major…

Machine Learning · Computer Science 2020-10-26 Jie Amy Yang , Jianyu Huang , Jongsoo Park , Ping Tak Peter Tang , Andrew Tulloch

Embedding Feature Selection for Large-scale Hierarchical Classification

Large-scale Hierarchical Classification (HC) involves datasets consisting of thousands of classes and millions of training instances with high-dimensional features posing several big data challenges. Feature selection that aims to select…

Machine Learning · Computer Science 2017-06-07 Azad Naik , Huzefa Rangwala

Unified Embedding: Battle-Tested Feature Representations for Web-Scale ML Systems

Learning high-quality feature embeddings efficiently and effectively is critical for the performance of web-scale machine learning systems. A typical model ingests hundreds of features with vocabularies on the order of millions to billions…

Machine Learning · Computer Science 2024-06-19 Benjamin Coleman , Wang-Cheng Kang , Matthew Fahrbach , Ruoxi Wang , Lichan Hong , Ed H. Chi , Derek Zhiyuan Cheng

Learning Effective and Efficient Embedding via an Adaptively-Masked Twins-based Layer

Embedding learning for categorical features is crucial for the deep learning-based recommendation models (DLRMs). Each feature value is mapped to an embedding vector via an embedding learning process. Conventional methods configure a fixed…

Machine Learning · Computer Science 2021-08-27 Bencheng Yan , Pengjie Wang , Kai Zhang , Wei Lin , Kuang-Chih Lee , Jian Xu , Bo Zheng

Learning Compressed Embeddings for On-Device Inference

In deep learning, embeddings are widely used to represent categorical entities such as words, apps, and movies. An embedding layer maps each entity to a unique vector, causing the layer's memory requirement to be proportional to the number…

Machine Learning · Computer Science 2022-03-22 Niketan Pansare , Jay Katukuri , Aditya Arora , Frank Cipollone , Riyaaz Shaik , Noyan Tokgozoglu , Chandru Venkataraman

Supervised Hashing Using Graph Cuts and Boosted Decision Trees

Embedding image features into a binary Hamming space can improve both the speed and accuracy of large-scale query-by-example image retrieval systems. Supervised hashing aims to map the original features to compact binary codes in a manner…

Machine Learning · Computer Science 2016-11-17 Guosheng Lin , Chunhua Shen , Anton van den Hengel

Beyond Embeddings: Interpretable Feature Extraction for Binary Code Similarity

Binary code similarity detection is a core task in reverse engineering. It supports malware analysis and vulnerability discovery by identifying semantically similar code in different contexts. Modern methods have progressed from manually…

Artificial Intelligence · Computer Science 2025-09-30 Charles E. Gagnon , Steven H. H. Ding , Philippe Charland , Benjamin C. M. Fung

Bilinear Supervised Hashing Based on 2D Image Features

Hashing has been recognized as an efficient representation learning method to effectively handle big data due to its low computational complexity and memory cost. Most of the existing hashing methods focus on learning the low-dimensional…

Computer Vision and Pattern Recognition · Computer Science 2019-01-08 Yujuan Ding , Wai Kueng Wong , Zhihui Lai , Zheng Zhang

Projection Bank: From High-dimensional Data to Medium-length Binary Codes

Recently, very high-dimensional feature representations, e.g., Fisher Vector, have achieved excellent performance for visual recognition and retrieval. However, these lengthy representations always cause extremely heavy computational and…

Computer Vision and Pattern Recognition · Computer Science 2015-09-17 Li Liu , Mengyang Yu , Ling Shao

Unsupervised Deep Hashing for Large-scale Visual Search

Learning based hashing plays a pivotal role in large-scale visual search. However, most existing hashing algorithms tend to learn shallow models that do not seek representative binary codes. In this paper, we propose a novel hashing…

Computer Vision and Pattern Recognition · Computer Science 2018-04-26 Zhaoqiang Xia , Xiaoyi Feng , Jinye Peng , Abdenour Hadid

MTrainS: Improving DLRM training efficiency using heterogeneous memories

Recommendation models are very large, requiring terabytes (TB) of memory during training. In pursuit of better quality, the model size and complexity grow over time, which requires additional training data to avoid overfitting. This model…

Information Retrieval · Computer Science 2023-05-03 Hiwot Tadese Kassa , Paul Johnson , Jason Akers , Mrinmoy Ghosh , Andrew Tulloch , Dheevatsa Mudigere , Jongsoo Park , Xing Liu , Ronald Dreslinski , Ehsan K. Ardestani