Related papers: Sparse Multiple Kernel Learning: Support Identific…
Feature selection is one of the most decisive tools in understanding data and machine learning models. Among other methods, sparsity induced by $L^{1}$ penalty is one of the simplest and best studied approaches to this problem. Although…
For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done…
The problem of multiple kernel learning based on penalized empirical risk minimization is discussed. The complexity penalty is determined jointly by the empirical $L_2$ norms and the reproducing kernel Hilbert space (RKHS) norms induced by…
We consider the problem of simultaneously learning to linearly combine a very large number of kernels and learn a good predictor based on the learnt kernel. When the number of kernels $d$ to be combined is very large, multiple kernel…
In this paper, we study the problem of sparse multiple kernel learning (MKL), where the goal is to efficiently learn a combination of a fixed small number of kernels from a large pool that could lead to a kernel classifier with a small…
In complex visual recognition tasks it is typical to adopt multiple descriptors, that describe different aspects of the images, for obtaining an improved recognition performance. Descriptors that have diverse forms can be fused into a…
Sparse model selection is ubiquitous from linear regression to graphical models where regularization paths, as a family of estimators upon the regularization parameter varying, are computed when the regularization parameter is unknown or…
We consider the two-group classification problem and propose a kernel classifier based on the optimal scoring framework. Unlike previous approaches, we provide theoretical guarantees on the expected risk consistency of the method. We also…
By removing irrelevant and redundant features, feature selection aims to find a good representation of the original features. With the prevalence of unlabeled data, unsupervised feature selection has been proven effective in alleviating the…
Simultaneous feature selection and non-linear function estimation is challenging in modeling, especially in high-dimensional settings where the number of variables exceeds the available sample size. In this article, we investigate the…
Kernel means are frequently used to represent probability distributions in machine learning problems. In particular, the well known kernel density estimator and the kernel mean embedding both have the form of a kernel mean. Unfortunately,…
The transcriptomics of cancer tumors are characterized with tens of thousands of gene expression features. Patient prognosis or tumor stage can be assessed by machine learning techniques like supervised classification tasks given a gene…
Multitask learning can be effective when features useful in one task are also useful for other tasks, and the group lasso is a standard method for selecting a common subset of features. In this paper, we are interested in a less restrictive…
A particularly interesting instance of supervised learning with kernels is when each training example is associated with two objects, as in pairwise classification (Brunner et al., 2012), and in supervised learning of preference relations…
In the field of data mining, how to deal with high-dimensional data is an inevitable problem. Unsupervised feature selection has attracted more and more attention because it does not rely on labels. The performance of spectral-based…
Clustering, a fundamental activity in unsupervised learning, is notoriously difficult when the feature space is high-dimensional. Fortunately, in many realistic scenarios, only a handful of features are relevant in distinguishing clusters.…
Feature selection and regularization are becoming increasingly prominent tools in the efforts of the reinforcement learning (RL) community to expand the reach and applicability of RL. One approach to the problem of feature selection is to…
Within the statistical and machine learning literature, regularization techniques are often used to construct sparse (predictive) models. Most regularization strategies only work for data where all predictors are treated identically, such…
In high-dimensional settings, sparse structures are critical for efficiency in term of memory and computation complexity. For a linear system, to find the sparsest solution provided with an over-complete dictionary of features directly is…
We provide a methodology for learning sparse statistical models that use as features all possible multiplicative interactions among an underlying atomic set of features. While the resulting optimization problems are exponentially sized, our…