Related papers: Min-Max Kernels

Hashing Algorithms for Large-Scale Learning

In this paper, we first demonstrate that b-bit minwise hashing, whose estimators are positive definite kernels, can be naturally integrated with learning algorithms such as SVM and logistic regression. We adopt a simple scheme to transform…

Machine Learning · Statistics 2011-06-07 Ping Li , Anshumali Shrivastava , Joshua Moore , Arnd Christian Konig

b-Bit Minwise Hashing for Large-Scale Linear SVM

In this paper, we propose to (seamlessly) integrate b-bit minwise hashing with linear SVM to substantially improve the training (and testing) efficiency using much smaller memory, with essentially no loss of accuracy. Theoretically, we…

Machine Learning · Computer Science 2015-03-19 Ping Li , Joshua Moore , Christian Konig

Accurate Estimators for Improving Minwise Hashing and b-Bit Minwise Hashing

Minwise hashing is the standard technique in the context of search and databases for efficiently estimating set (e.g., high-dimensional 0/1 vector) similarities. Recently, b-bit minwise hashing was proposed which significantly improves upon…

Machine Learning · Statistics 2011-08-04 Ping Li , Christian Konig

Theory of the GMM Kernel

We develop some theoretical results for a robust similarity measure named "generalized min-max" (GMM). This similarity has direct applications in machine learning as a positive definite kernel and can be efficiently computed via…

Methodology · Statistics 2016-08-02 Ping Li , Cun-Hui Zhang

b-Bit Minwise Hashing

This paper establishes the theoretical framework of b-bit minwise hashing. The original minwise hashing method has become a standard technique for estimating set similarity (e.g., resemblance) with applications in information retrieval,…

Data Structures and Algorithms · Computer Science 2009-10-20 Ping Li , Arnd Christian Konig

Kernel Representation and Similarity Measure for Incomplete Data

Measuring similarity between incomplete data is a fundamental challenge in web mining, recommendation systems, and user behavior analysis. Traditional approaches either discard incomplete data or perform imputation as a preprocessing step,…

Machine Learning · Computer Science 2025-10-16 Yang Cao , Sikun Yang , Kai He , Wenjun Ma , Ming Liu , Yujiu Yang , Jian Weng

A Comparison Study of Nonlinear Kernels

In this paper, we compare 5 different nonlinear kernels: min-max, RBF, fRBF (folded RBF), acos, and acos-$\chi^2$, on a wide range of publicly available datasets. The proposed fRBF kernel performs very similarly to the RBF kernel. Both RBF…

Machine Learning · Statistics 2016-03-22 Ping Li

On b-bit min-wise hashing for large-scale regression and classification with sparse data

Large-scale regression problems where both the number of variables, $p$, and the number of observations, $n$, may be large and in the order of millions or more, are becoming increasingly more common. Typically the data are sparse: only a…

Statistics Theory · Mathematics 2018-02-27 Rajen D. Shah , Nicolai Meinshausen

Randomized Kernel Methods for Least-Squares Support Vector Machines

The least-squares support vector machine is a frequently used kernel method for non-linear regression and classification tasks. Here we discuss several approximation algorithms for the least-squares support vector machine classifier. The…

Machine Learning · Computer Science 2017-03-24 M. Andrecut

Object Proposal with Kernelized Partial Ranking

Object proposals are an ensemble of bounding boxes with high potential to contain objects. In order to determine a small set of proposals with a high recall, a common scheme is extracting multiple features followed by a ranking algorithm…

Computer Vision and Pattern Recognition · Computer Science 2017-05-19 Jing Wang , Jie Shen , Ping Li

Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has…

Machine Learning · Computer Science 2019-03-12 Zhao Kang , Yiwei Lu , Yuanzhang Su , Changsheng Li , Zenglin Xu

Adaptive Explicit Kernel Minkowski Weighted K-means

The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space and it is applicable when clusters are linearly separable. The kernel K-means, which extends…

Machine Learning · Computer Science 2020-12-08 Amir Aradnia , Maryam Amir Haeri , Mohammad Mehdi Ebadzadeh

DartMinHash: Fast Sketching for Weighted Sets

Weighted minwise hashing is a standard dimensionality reduction technique with applications to similarity search and large-scale kernel machines. We introduce a simple algorithm that takes a weighted set $x \in \mathbb{R}_{\geq 0}^{d}$ and…

Data Structures and Algorithms · Computer Science 2020-05-26 Tobias Christiani

Kernelized Classification in Deep Networks

We propose a kernelized classification layer for deep networks. Although conventional deep networks introduce an abundance of nonlinearity for representation (feature) learning, they almost universally use a linear classifier on the learned…

Machine Learning · Computer Science 2021-03-22 Sadeep Jayasumana , Srikumar Ramalingam , Sanjiv Kumar

Scalable Kernel Clustering: Approximate Kernel k-means

Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…

Computer Vision and Pattern Recognition · Computer Science 2014-02-18 Radha Chitta , Rong Jin , Timothy C. Havens , Anil K. Jain

Kernel k-Means, By All Means: Algorithms and Strong Consistency

Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from…

Machine Learning · Statistics 2020-11-13 Debolina Paul , Saptarshi Chakraborty , Swagatam Das , Jason Xu

Memory and Computation-Efficient Kernel SVM via Binary Embedding and Ternary Model Coefficients

Kernel approximation is widely used to scale up kernel SVM training and prediction. However, the memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices such as…

Machine Learning · Computer Science 2020-10-07 Zijian Lei , Liang Lan

Nystrom Method for Approximating the GMM Kernel

The GMM (generalized min-max) kernel was recently proposed (Li, 2016) as a measure of data similarity and was demonstrated effective in machine learning tasks. In order to use the GMM kernel for large-scale datasets, the prior work resorted…

Machine Learning · Statistics 2016-07-13 Ping Li

Large-scale Kernel-based Feature Extraction via Budgeted Nonlinear Subspace Tracking

Kernel-based methods enjoy powerful generalization capabilities in handling a variety of learning tasks. When such methods are provided with sufficient training data, broadly-applicable classes of nonlinear functions can be approximated…

Machine Learning · Statistics 2017-12-29 Fatemeh Sheikholeslami , Dimitris Berberidis , Georgios B. Giannakis

Engineering a Simplified 0-Bit Consistent Weighted Sampling

The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and…

Machine Learning · Statistics 2018-10-24 Edward Raff , Jared Sylvester , Charles Nicholas