Related papers: Consistent Text Categorization using Data Augmenta…

Text Classification for Predicting Multi-level Product Categories

In an online shopping platform, a detailed classification of the products facilitates user navigation. It also helps online retailers keep track of the price fluctuations in a certain industry or special discounts on a specific product…

Information Retrieval · Computer Science 2021-09-07 Hadi Jahanshahi , Ozan Ozyegen , Mucahit Cevik , Beste Bulut , Deniz Yigit , Fahrettin F. Gonen , Ayşe Başar

Large Scale Product Categorization using Structured and Unstructured Attributes

Product categorization using text data for eCommerce is a very challenging extreme classification problem with several thousands of classes and several millions of products to classify. Even though multi-class text classification is a well…

Information Retrieval · Computer Science 2019-03-12 Abhinandan Krishnan , Abilash Amarthaluri

Compositional Generalization for Multi-label Text Classification: A Data-Augmentation Approach

Despite significant advancements in multi-label text classification, the ability of existing models to generalize to novel and seldom-encountered complex concepts, which are compositions of elementary ones, remains underexplored. This…

Computation and Language · Computer Science 2023-12-21 Yuyang Chai , Zhuang Li , Jiahui Liu , Lei Chen , Fei Li , Donghong Ji , Chong Teng

Text-Based Product Matching -- Semi-Supervised Clustering Approach

Matching identical products present in multiple product feeds constitutes a crucial element of many tasks of e-commerce, such as comparing product offerings, dynamic price optimization, and selecting the assortment personalized for the…

Databases · Computer Science 2024-02-16 Alicja Martinek , Szymon Łukasik , Amir H. Gandomi

LLM-Enhanced Reranking for Complementary Product Recommendation

Complementary product recommendation, which aims to suggest items that are used together to enhance customer value, is a crucial yet challenging task in e-commerce. While existing graph neural network (GNN) approaches have made significant…

Information Retrieval · Computer Science 2025-12-02 Zekun Xu , Yudi Zhang

Augmenting the User-Item Graph with Textual Similarity Models

This paper introduces a simple and effective form of data augmentation for recommender systems. A paraphrase similarity model is applied to widely available textual data, such as reviews and product descriptions, yielding new semantic…

Computation and Language · Computer Science 2021-09-21 Federico López , Martin Scholz , Jessica Yung , Marie Pellat , Michael Strube , Lucas Dixon

Product Classification in E-Commerce using Distributional Semantics

Product classification is the task of automatically predicting a taxonomy path for a product in a predefined taxonomy hierarchy given a textual product description or title. For efficient product classification we require a suitable…

Artificial Intelligence · Computer Science 2016-07-26 Vivek Gupta , Harish Karnick , Ashendra Bansal , Pradhuman Jhala

A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization…

Computation and Language · Computer Science 2022-09-09 Markus Bayer , Marc-André Kaufhold , Christian Reuter

Towards More Relevant Product Search Ranking Via Large Language Models: An Empirical Study

Training Learning-to-Rank models for e-commerce product search ranking can be challenging due to the lack of a gold standard of ranking relevance. In this paper, we decompose ranking relevance into content-based and engagement-based…

Information Retrieval · Computer Science 2024-09-27 Qi Liu , Atul Singh , Jingbo Liu , Cun Mu , Zheng Yan

Automated Query-Product Relevance Labeling using Large Language Models for E-commerce Search

Accurate query-product relevance labeling is indispensable to generate ground truth dataset for search ranking in e-commerce. Traditional approaches for annotating query-product pairs rely on human-based labeling services, which is…

Information Retrieval · Computer Science 2025-02-27 Jayant Sachdev , Sean D Rosario , Abhijeet Phatak , He Wen , Swati Kirti , Chittaranjan Tripathy

An Integrated Approach for Improving Brand Consistency of Web Content: Modeling, Analysis and Recommendation

A consumer-dependent (business-to-consumer) organization tends to present itself as possessing a set of human qualities, which is termed as the brand personality of the company. The perception is impressed upon the consumer through the…

Computation and Language · Computer Science 2021-08-17 Soumyadeep Roy , Shamik Sural , Niyati Chhaya , Anandhavelu Natarajan , Niloy Ganguly

Knowledge Distillation based Contextual Relevance Matching for E-commerce Product Search

Online relevance matching is an essential task of e-commerce product search to boost the utility of search engines and ensure a smooth user experience. Previous work adopts either classical relevance matching models or Transformer-style…

Information Retrieval · Computer Science 2022-10-05 Ziyang Liu , Chaokun Wang , Hao Feng , Lingfei Wu , Liqun Yang

Learning to Compose Domain-Specific Transformations for Data Augmentation

Data augmentation is a ubiquitous technique for increasing the size of labeled training sets by leveraging task-specific data transformations that preserve class labels. While it is often easy for domain experts to specify individual…

Machine Learning · Statistics 2018-12-10 Alexander J. Ratner , Henry R. Ehrenberg , Zeshan Hussain , Jared Dunnmon , Christopher Ré

Retrieval-augmented Multi-label Text Classification

Multi-label text classification (MLC) is a challenging task in settings of large label sets, where label support follows a Zipfian distribution. In this paper, we address this problem through retrieval augmentation, aiming to improve the…

Computation and Language · Computer Science 2023-05-23 Ilias Chalkidis , Yova Kementchedjhieva

Multi-output Headed Ensembles for Product Item Classification

In this paper, we revisit the problem of product item classification for large-scale e-commerce catalogs. The taxonomy of e-commerce catalogs consists of thousands of genres to which are assigned items that are uploaded by merchants on a…

Machine Learning · Computer Science 2023-08-01 Hotaka Shiokawa , Pradipto Das , Arthur Toth , Justin Chiu

A Multi-task Learning Framework for Product Ranking with BERT

Product ranking is a crucial component for many e-commerce services. One of the major challenges in product search is the vocabulary mismatch between query and products, which may be a larger vocabulary gap problem compared to other…

Information Retrieval · Computer Science 2022-04-04 Xuyang Wu , Alessandro Magnani , Suthee Chaidaroon , Ajit Puthenputhussery , Ciya Liao , Yi Fang

LLM-Based Robust Product Classification in Commerce and Compliance

Product classification is a crucial task in international trade, as compliance regulations are verified and taxes and duties are applied based on product categories. Manual classification of products is time-consuming and error-prone, and…

Computation and Language · Computer Science 2024-10-16 Sina Gholamian , Gianfranco Romani , Bartosz Rudnikowicz , Stavroula Skylaki

Cross Encoding as Augmentation: Towards Effective Educational Text Classification

Text classification in education, usually called auto-tagging, is the automated process of assigning relevant tags to educational content, such as questions and textbooks. However, auto-tagging suffers from a data scarcity problem, which…

Computation and Language · Computer Science 2023-06-01 Hyun Seung Lee , Seungtaek Choi , Yunsung Lee , Hyeongdon Moon , Shinhyeok Oh , Myeongho Jeong , Hyojun Go , Christian Wallraven

Improve Text Classification Accuracy with Intent Information

Text classification, a core component of task-oriented dialogue systems, attracts continuous research from both the research and industry community, and has resulted in tremendous progress. However, existing method does not consider the use…

Computation and Language · Computer Science 2022-12-16 Yifeng Xie

A Data-Centric Approach to Multilingual E-Commerce Product Search: Case Study on Query-Category and Query-Item Relevance

Multilingual e-commerce search suffers from severe data imbalance across languages, label noise, and limited supervision for low-resource languages--challenges that impede the cross-lingual generalization of relevance models despite the…

Information Retrieval · Computer Science 2025-10-27 Yabo Yin , Yang Xi , Jialong Wang , Shanqi Wang , Jiateng Hu