Related papers: Data Optimisation for a Deep Learning Recommender …

Studying the Impact of Data Disclosure Mechanism in Recommender Systems via Simulation

Recently, privacy issues in web services that rely on users' personal data have raised great attention. Unlike existing privacy-preserving technologies such as federated learning and differential privacy, we explore another way to mitigate…

Information Retrieval · Computer Science 2022-10-21 Ziqian Chen , Fei Sun , Yifan Tang , Haokun Chen , Jinyang Gao , Bolin Ding

Metric Optimization and Mainstream Bias Mitigation in Recommender Systems

The first part of this thesis focuses on maximizing the overall recommendation accuracy. This accuracy is usually evaluated with some user-oriented metric tailored to the recommendation scenario, but because recommendation is usually…

Information Retrieval · Computer Science 2023-11-14 Roger Zhe Li

Partially Synthetic Data for Recommender Systems: Prediction Performance and Preference Hiding

This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item…

Information Retrieval · Computer Science 2020-08-11 Manel Slokom , Martha Larson , Alan Hanjalic

Synthetic Data and Simulators for Recommendation Systems: Current State and Future Directions

Synthetic data and simulators have the potential to markedly improve the performance and robustness of recommendation systems. These approaches have already had a beneficial impact in other machine-learning driven fields. We identify and…

Information Retrieval · Computer Science 2021-12-22 Adam Lesnikowski , Gabriel de Souza Pereira Moreira , Sara Rabhi , Karl Byleen-Higley

Representation Matters: Assessing the Importance of Subgroup Allocations in Training Data

Collecting more diverse and representative training data is often touted as a remedy for the disparate performance of machine learning predictors across subpopulations. However, a precise framework for understanding how dataset properties…

Machine Learning · Computer Science 2021-06-08 Esther Rolf , Theodora Worledge , Benjamin Recht , Michael I. Jordan

A Proposal to Study "Is High Quality Data All We Need?"

Even though deep neural models have achieved superhuman performance on many popular benchmarks, they have failed to generalize to OOD or adversarial datasets. Conventional approaches aimed at increasing robustness include developing…

Machine Learning · Computer Science 2022-03-15 Swaroop Mishra , Anjana Arunkumar

A Framework for Undergraduate Data Collection Strategies for Student Support Recommendation Systems in Higher Education

Understanding which student support strategies mitigate dropout and improve student retention is an important part of modern higher educational research. One of the largest challenges institutions of higher learning currently face is the…

Information Retrieval · Computer Science 2022-10-20 Herkulaas MvE Combrink , Vukosi Marivate , Benjamin Rosman

What Data is Really Necessary? A Feasibility Study of Inference Data Minimization for Recommender Systems

Data minimization is a legal principle requiring personal data processing to be limited to what is necessary for a specified purpose. Operationalizing this principle for recommender systems, which rely on extensive personal data, remains a…

Machine Learning · Computer Science 2025-09-01 Jens Leysen , Marco Favier , Bart Goethals

Quality of Data in Machine Learning

A common assumption exists according to which machine learning models improve their performance when they have more data to learn from. In this study, the authors wished to clarify the dilemma by performing an empirical experiment utilizing…

Machine Learning · Computer Science 2021-12-20 Antti Kariluoto , Arto Pärnänen , Joni Kultanen , Jukka Soininen , Pekka Abrahamsson

The Unreasonable Effectiveness of Data for Recommender Systems

In recommender systems, collecting, storing, and processing large-scale interaction data is increasingly costly in terms of time, energy, and computation, yet it remains unclear when additional data stops providing meaningful gains. This…

Information Retrieval · Computer Science 2026-04-10 Youssef Abdou

Predictive accuracy of recommender algorithms

Recommender systems present a customized list of items based upon user or item characteristics with the objective of reducing a large number of possible choices to a smaller ranked set most likely to appeal to the user. A variety of…

Information Retrieval · Computer Science 2024-07-02 William Noffsinger

Infusing Collaborative Recommenders with Distributed Representations

Recommender systems assist users in navigating complex information spaces and focus their attention on the content most relevant to their needs. Often these systems rely on user activity or descriptions of the content. Social annotation…

Information Retrieval · Computer Science 2016-08-24 Greg Zanotti , Miller Horvath , Lucas Nunes Barbosa , Venkata Trinadh Kumar Gupta Immedisetty , Jonathan Gemmell

A Novel Metric for Measuring Data Quality in Classification Applications (extended version)

Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…

Machine Learning · Computer Science 2023-12-14 Jouseau Roxane , Salva Sébastien , Samir Chafik

The Limits of Preference Data for Post-Training

Recent progress in strengthening the capabilities of large language models has stemmed from applying reinforcement learning to domains with automatically verifiable outcomes. A key question is whether we can similarly use RL to optimize for…

Machine Learning · Computer Science 2025-05-27 Eric Zhao , Jessica Dai , Pranjal Awasthi

Data Poisoning Attacks to Deep Learning Based Recommender Systems

Recommender systems play a crucial role in helping users to find their interested information in various web services such as Amazon, YouTube, and Google News. Various recommender systems, ranging from neighborhood-based,…

Cryptography and Security · Computer Science 2021-01-11 Hai Huang , Jiaming Mu , Neil Zhenqiang Gong , Qi Li , Bin Liu , Mingwei Xu

Understanding the Impact of Sampling Quality in Direct Preference Optimization

We study how data of higher quality can be leveraged to improve performance in Direct Preference Optimization (DPO), aiming to understand its impact on DPO training dynamics. Our analyses show that both the solution space and the…

Machine Learning · Computer Science 2025-10-14 Kyung Rok Kim , Yumo Bai , Chonghuan Wang , Guanting Chen

Efficient Dataset Selection for Continual Adaptation of Generative Recommenders

Recommendation systems must continuously adapt to evolving user behavior, yet the volume of data generated in large-scale streaming environments makes frequent full retraining impractical. This work investigates how targeted data selection…

Information Retrieval · Computer Science 2026-04-10 Cathy Jiao , Juan Elenter , Praveen Ravichandran , Bernd Huber , Joseph Cauteruccio , Todd Wasson , Timothy Heath , Chenyan Xiong , Mounia Lalmas , Paul Bennett

Top-N Recommendation with Counterfactual User Preference Simulation

Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications. Traditional models usually motivate themselves by designing complex or tailored architectures…

Information Retrieval · Computer Science 2021-09-14 Mengyue Yang , Quanyu Dai , Zhenhua Dong , Xu Chen , Xiuqiang He , Jun Wang

The Data Minimization Principle in Machine Learning

The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has…

Machine Learning · Computer Science 2024-05-31 Prakhar Ganesh , Cuong Tran , Reza Shokri , Ferdinando Fioretto

The Potential Benefits of Filtering Versus Hyper-Parameter Optimization

The quality of an induced model by a learning algorithm is dependent on the quality of the training data and the hyper-parameters supplied to the learning algorithm. Prior work has shown that improving the quality of the training data…

Machine Learning · Statistics 2014-03-14 Michael R. Smith , Tony Martinez , Christophe Giraud-Carrier