Related papers: On Sampling Collaborative Filtering Datasets

SVP-CF: Selection via Proxy for Collaborative Filtering Data

We study the practical consequences of dataset sampling strategies on the performance of recommendation algorithms. Recommender systems are generally trained and evaluated on samples of larger datasets. Samples are often taken in a naive or…

Information Retrieval · Computer Science 2021-07-13 Noveen Sachdeva , Carole-Jean Wu , Julian McAuley

Optimal Dataset Size for Recommender Systems: Evaluating Algorithms' Performance via Downsampling

This thesis investigates dataset downsampling as a strategy to optimize energy efficiency in recommender systems while maintaining competitive performance. With increasing dataset sizes posing computational and environmental challenges,…

Information Retrieval · Computer Science 2025-02-17 Ardalan Arabzadeh

Dataset Pruning in RecSys and ML: Best Practice or Mal-Practice?

Offline evaluations in recommender system research depend heavily on datasets, many of which are pruned, such as the widely used MovieLens collections. This thesis examines the impact of data pruning - specifically, removing users with…

Information Retrieval · Computer Science 2025-10-17 Leonie Winter

Federated Learning under Importance Sampling

Federated learning encapsulates distributed learning strategies that are managed by a central unit. Since it relies on using a selected number of agents at each iteration, and since each agent, in turn, taps into its local data, it is only…

Machine Learning · Computer Science 2020-12-15 Elsa Rizk , Stefan Vlaski , Ali H. Sayed

On Sampling Strategies for Neural Network-based Collaborative Filtering

Recent advances in neural networks have inspired people to design hybrid recommendation algorithms that can incorporate both (1) user-item interaction information and (2) content information including image, audio, and text. Despite their…

Machine Learning · Computer Science 2017-06-27 Ting Chen , Yizhou Sun , Yue Shi , Liangjie Hong

Model-specific Data Subsampling with Influence Functions

Model selection requires repeatedly evaluating models on a given dataset and measuring their relative performances. In modern applications of machine learning, the models being considered are increasingly more expensive to evaluate and the…

Machine Learning · Computer Science 2020-10-21 Anant Raj , Cameron Musco , Lester Mackey , Nicolo Fusi

Green Recommender Systems: Optimizing Dataset Size for Energy-Efficient Algorithm Performance

As recommender systems become increasingly prevalent, the environmental impact and energy efficiency of training large-scale models have come under scrutiny. This paper investigates the potential for energy-efficient algorithm performance…

Information Retrieval · Computer Science 2024-11-06 Ardalan Arabzadeh , Tobias Vente , Joeran Beel

Improving Performance of a Group of Classification Algorithms Using Resampling and Feature Selection

In recent years the importance of finding a meaningful pattern from huge datasets has become more challenging. Data miners try to adopt innovative methods to face this problem by applying feature selection methods. In this paper we propose…

Machine Learning · Computer Science 2014-03-11 Mehdi Naseriparsa , Amir-masoud Bidgoli , Touraj Varaee

DsDm: Model-Aware Dataset Selection with Datamodels

When selecting data for training large-scale models, standard practice is to filter for examples that match human notions of data quality. Such filtering yields qualitatively clean datapoints that intuitively should improve model behavior.…

Machine Learning · Computer Science 2024-01-24 Logan Engstrom , Axel Feldmann , Aleksander Madry

Energy Efficient Sampling Policies for Edge Computing Feedback Systems

We study the problem of finding efficient sampling policies in an edge-based feedback system, where sensor samples are offloaded to a back-end server that processes them and generates feedback to a user. Sampling the system at maximum…

Information Theory · Computer Science 2023-02-07 Vishnu Narayanan Moothedath , Jaya Prakash Champati , James Gross

A Distributed Collaborative Filtering Algorithm Using Multiple Data Sources

Collaborative Filtering (CF) is one of the most commonly used recommendation methods. CF consists in predicting whether, or how much, a user will like (or dislike) an item by leveraging the knowledge of the user's preferences as well as…

Information Retrieval · Computer Science 2018-07-17 Mohamed Reda Bouadjenek , Esther Pacitti , Maximilien Servajean , Florent Masseglia , Amr El Abbadi

Recommending Learning Algorithms and Their Associated Hyperparameters

The success of machine learning on a given task dependson, among other things, which learning algorithm is selected and its associated hyperparameters. Selecting an appropriate learning algorithm and setting its hyperparameters for a given…

Machine Learning · Computer Science 2014-07-09 Michael R. Smith , Logan Mitchell , Christophe Giraud-Carrier , Tony Martinez

Feature Selection with Annealing for Computer Vision and Big Data Learning

Many computer vision and medical imaging problems are faced with learning from large-scale datasets, with millions of observations and features. In this paper we propose a novel efficient learning scheme that tightens a sparsity constraint…

Machine Learning · Statistics 2017-02-07 Adrian Barbu , Yiyuan She , Liangjing Ding , Gary Gramajo

Data Sampling Affects the Complexity of Online SGD over Dependent Data

Conventional machine learning applications typically assume that data samples are independently and identically distributed (i.i.d.). However, practical scenarios often involve a data-generating process that produces highly dependent data…

Machine Learning · Computer Science 2022-04-04 Shaocong Ma , Ziyi Chen , Yi Zhou , Kaiyi Ji , Yingbin Liang

Accelerating Machine Learning Algorithms with Adaptive Sampling

The era of huge data necessitates highly efficient machine learning algorithms. Many common machine learning algorithms, however, rely on computationally intensive subroutines that are prohibitively expensive on large datasets. Oftentimes,…

Machine Learning · Computer Science 2023-09-26 Mo Tiwari

Comparison of the Efficiency of Different Algorithms on Recommendation System Design: a Case Study

By the growing trend of online shopping and e-commerce websites, recommendation systems have gained more importance in recent years in order to increase the sales ratios of companies. Different algorithms on recommendation systems are used…

Information Retrieval · Computer Science 2017-01-19 Gürkan Alpaslan

Untangling the Effects of Down-Sampling and Selection in Genetic Programming

Genetic programming systems often use large training sets to evaluate the quality of candidate solutions for selection, which is often computationally expensive. Down-sampling training sets has long been used to decrease the computational…

Neural and Evolutionary Computing · Computer Science 2024-08-02 Ryan Boldi , Ashley Bao , Martin Briesch , Thomas Helmuth , Dominik Sobania , Lee Spector , Alexander Lalejini

Performance Comparisons of Reinforcement Learning Algorithms for Sequential Experimental Design

Recent developments in sequential experimental design look to construct a policy that can efficiently navigate the design space, in a way that maximises the expected information gain. Whilst there is work on achieving tractable policies for…

Machine Learning · Computer Science 2025-08-20 Yasir Zubayr Barlas , Kizito Salako

Analyzing the Capabilities of Nature-inspired Feature Selection Algorithms in Predicting Student Performance

Predicting student performance is key in leveraging effective pre-failure interventions for at-risk students. As educational data grows larger, more effective means of analyzing student data in a timely manner are needed in order to provide…

Machine Learning · Computer Science 2023-10-10 Thomas Trask

Sequential Nature of Recommender Systems Disrupts the Evaluation Process

Datasets are often generated in a sequential manner, where the previous samples and intermediate decisions or interventions affect subsequent samples. This is especially prominent in cases where there are significant human-AI interactions,…

Information Retrieval · Computer Science 2022-05-30 Ali Shirali