Related papers: Data Optimisation for a Deep Learning Recommender …
Recently, privacy issues in web services that rely on users' personal data have raised great attention. Unlike existing privacy-preserving technologies such as federated learning and differential privacy, we explore another way to mitigate…
The first part of this thesis focuses on maximizing the overall recommendation accuracy. This accuracy is usually evaluated with some user-oriented metric tailored to the recommendation scenario, but because recommendation is usually…
This paper demonstrates the potential of statistical disclosure control for protecting the data used to train recommender systems. Specifically, we use a synthetic data generation approach to hide specific information in the user-item…
Synthetic data and simulators have the potential to markedly improve the performance and robustness of recommendation systems. These approaches have already had a beneficial impact in other machine-learning driven fields. We identify and…
Collecting more diverse and representative training data is often touted as a remedy for the disparate performance of machine learning predictors across subpopulations. However, a precise framework for understanding how dataset properties…
Even though deep neural models have achieved superhuman performance on many popular benchmarks, they have failed to generalize to OOD or adversarial datasets. Conventional approaches aimed at increasing robustness include developing…
Understanding which student support strategies mitigate dropout and improve student retention is an important part of modern higher educational research. One of the largest challenges institutions of higher learning currently face is the…
Data minimization is a legal principle requiring personal data processing to be limited to what is necessary for a specified purpose. Operationalizing this principle for recommender systems, which rely on extensive personal data, remains a…
A common assumption exists according to which machine learning models improve their performance when they have more data to learn from. In this study, the authors wished to clarify the dilemma by performing an empirical experiment utilizing…
In recommender systems, collecting, storing, and processing large-scale interaction data is increasingly costly in terms of time, energy, and computation, yet it remains unclear when additional data stops providing meaningful gains. This…
Recommender systems present a customized list of items based upon user or item characteristics with the objective of reducing a large number of possible choices to a smaller ranked set most likely to appeal to the user. A variety of…
Recommender systems assist users in navigating complex information spaces and focus their attention on the content most relevant to their needs. Often these systems rely on user activity or descriptions of the content. Social annotation…
Data quality is a key element for building and optimizing good learning models. Despite many attempts to characterize data quality, there is still a need for rigorous formalization and an efficient measure of the quality from available…
Recent progress in strengthening the capabilities of large language models has stemmed from applying reinforcement learning to domains with automatically verifiable outcomes. A key question is whether we can similarly use RL to optimize for…
Recommender systems play a crucial role in helping users to find their interested information in various web services such as Amazon, YouTube, and Google News. Various recommender systems, ranging from neighborhood-based,…
We study how data of higher quality can be leveraged to improve performance in Direct Preference Optimization (DPO), aiming to understand its impact on DPO training dynamics. Our analyses show that both the solution space and the…
Recommendation systems must continuously adapt to evolving user behavior, yet the volume of data generated in large-scale streaming environments makes frequent full retraining impractical. This work investigates how targeted data selection…
Top-N recommendation, which aims to learn user ranking-based preference, has long been a fundamental problem in a wide range of applications. Traditional models usually motivate themselves by designing complex or tailored architectures…
The principle of data minimization aims to reduce the amount of data collected, processed or retained to minimize the potential for misuse, unauthorized access, or data breaches. Rooted in privacy-by-design principles, data minimization has…
The quality of an induced model by a learning algorithm is dependent on the quality of the training data and the hyper-parameters supplied to the learning algorithm. Prior work has shown that improving the quality of the training data…