Related papers: Collaborative Filtering via High-Dimensional Regre…
We present ImplicitSLIM, a novel unsupervised learning approach for sparse high-dimensional data, with applications to collaborative filtering. Sparse linear methods (SLIM) and their variations show outstanding performance, but they are…
Recently, linear regression models, such as EASE and SLIM, have shown to often produce rather competitive results against more sophisticated deep learning models. On the other side, the (weighted) matrix factorization approaches have been…
This paper studies model selection consistency for high dimensional sparse regression when data exhibits both cross-sectional and serial dependency. Most commonly-used model selection methods fail to consistently recover the true model when…
Combining simple elements from the literature, we define a linear model that is geared toward sparse data, in particular implicit feedback data for recommender systems. We show that its training objective has a closed-form solution, and…
We provide a novel -- and to the best of our knowledge, the first -- algorithm for high dimensional sparse regression with constant fraction of corruptions in explanatory and/or response variables. Our algorithm recovers the true sparse…
Motivation: The high dimensionality of genomic data calls for the development of specific classification methodologies, especially to prevent over-optimistic predictions. This challenge can be tackled by compression and variable selection,…
The application of machine learning techniques to large-scale personalized recommendation problems is a challenging task. Such systems must make sense of enormous amounts of implicit feedback in order to understand user preferences across…
As the size of datasets used in statistical learning continues to grow, distributed training of models has attracted increasing attention. These methods partition the data and exploit parallelism to reduce memory and runtime, but suffer…
We present a new class of methods for high-dimensional nonparametric regression and classification called sparse additive models (SpAM). Our methods combine ideas from sparse linear modeling and additive nonparametric regression. We derive…
Sparse modelling or model selection with categorical data is challenging even for a moderate number of variables, because one parameter is roughly needed to encode one category or level. The Group Lasso is a well known efficient algorithm…
We study the high-dimensional linear regression problem with categorical predictors that have many levels. We propose a new estimation approach, which performs model compression via two mechanisms by simultaneously encouraging (a)…
We propose a method to reconstruct and cluster incomplete high-dimensional data lying in a union of low-dimensional subspaces. Exploring the sparse representation model, we jointly estimate the missing data while imposing the intrinsic…
Analysis of high-dimensional data has led to increased interest in both single index models (SIMs) and the best-subset selection. SIMs provide an interpretable and flexible modeling framework for high-dimensional data, while the best-subset…
We consider high-dimensional binary classification by sparse logistic regression. We propose a model/feature selection procedure based on penalized maximum likelihood with a complexity penalty on the model size and derive the non-asymptotic…
The two primary approaches for high-dimensional regression problems are sparse methods (e.g., best subset selection, which uses the L0-norm in the penalty) and ensemble methods (e.g., random forests). Although sparse methods typically yield…
Preference elicitation is an active learning approach to tackle the cold-start problem of recommender systems. Roughly speaking, new users are asked to rate some carefully selected items in order to compute appropriate recommendations for…
Sampling strategies have been widely applied in many recommendation systems to accelerate model learning from implicit feedback data. A typical strategy is to draw negative instances with uniform distribution, which however will severely…
Federated learning has become a popular tool in the big data era nowadays. It trains a centralized model based on data from different clients while keeping data decentralized. In this paper, we propose a federated sparse sliced inverse…
We propose a novel, efficient approach for distributed sparse learning in high-dimensions, where observations are randomly partitioned across machines. Computationally, at each round our method only requires the master machine to solve a…
Recent work has focused on the problem of conducting linear regression when the number of covariates is very large, potentially greater than the sample size. To facilitate this, one useful tool is to assume that the model can be well…