English
Related papers

Related papers: Min-Max Kernels

200 papers

In this paper, we first demonstrate that b-bit minwise hashing, whose estimators are positive definite kernels, can be naturally integrated with learning algorithms such as SVM and logistic regression. We adopt a simple scheme to transform…

Machine Learning · Statistics 2011-06-07 Ping Li , Anshumali Shrivastava , Joshua Moore , Arnd Christian Konig

In this paper, we propose to (seamlessly) integrate b-bit minwise hashing with linear SVM to substantially improve the training (and testing) efficiency using much smaller memory, with essentially no loss of accuracy. Theoretically, we…

Machine Learning · Computer Science 2015-03-19 Ping Li , Joshua Moore , Christian Konig

Minwise hashing is the standard technique in the context of search and databases for efficiently estimating set (e.g., high-dimensional 0/1 vector) similarities. Recently, b-bit minwise hashing was proposed which significantly improves upon…

Machine Learning · Statistics 2011-08-04 Ping Li , Christian Konig

We develop some theoretical results for a robust similarity measure named "generalized min-max" (GMM). This similarity has direct applications in machine learning as a positive definite kernel and can be efficiently computed via…

Methodology · Statistics 2016-08-02 Ping Li , Cun-Hui Zhang

This paper establishes the theoretical framework of b-bit minwise hashing. The original minwise hashing method has become a standard technique for estimating set similarity (e.g., resemblance) with applications in information retrieval,…

Data Structures and Algorithms · Computer Science 2009-10-20 Ping Li , Arnd Christian Konig

Measuring similarity between incomplete data is a fundamental challenge in web mining, recommendation systems, and user behavior analysis. Traditional approaches either discard incomplete data or perform imputation as a preprocessing step,…

Machine Learning · Computer Science 2025-10-16 Yang Cao , Sikun Yang , Kai He , Wenjun Ma , Ming Liu , Yujiu Yang , Jian Weng

In this paper, we compare 5 different nonlinear kernels: min-max, RBF, fRBF (folded RBF), acos, and acos-$\chi^2$, on a wide range of publicly available datasets. The proposed fRBF kernel performs very similarly to the RBF kernel. Both RBF…

Machine Learning · Statistics 2016-03-22 Ping Li

Large-scale regression problems where both the number of variables, $p$, and the number of observations, $n$, may be large and in the order of millions or more, are becoming increasingly more common. Typically the data are sparse: only a…

Statistics Theory · Mathematics 2018-02-27 Rajen D. Shah , Nicolai Meinshausen

The least-squares support vector machine is a frequently used kernel method for non-linear regression and classification tasks. Here we discuss several approximation algorithms for the least-squares support vector machine classifier. The…

Machine Learning · Computer Science 2017-03-24 M. Andrecut

Object proposals are an ensemble of bounding boxes with high potential to contain objects. In order to determine a small set of proposals with a high recall, a common scheme is extracting multiple features followed by a ranking algorithm…

Computer Vision and Pattern Recognition · Computer Science 2017-05-19 Jing Wang , Jie Shen , Ping Li

Data similarity is a key concept in many data-driven applications. Many algorithms are sensitive to similarity measures. To tackle this fundamental problem, automatically learning of similarity information from data via self-expression has…

Machine Learning · Computer Science 2019-03-12 Zhao Kang , Yiwei Lu , Yuanzhang Su , Changsheng Li , Zenglin Xu

The K-means algorithm is among the most commonly used data clustering methods. However, the regular K-means can only be applied in the input space and it is applicable when clusters are linearly separable. The kernel K-means, which extends…

Machine Learning · Computer Science 2020-12-08 Amir Aradnia , Maryam Amir Haeri , Mohammad Mehdi Ebadzadeh

Weighted minwise hashing is a standard dimensionality reduction technique with applications to similarity search and large-scale kernel machines. We introduce a simple algorithm that takes a weighted set $x \in \mathbb{R}_{\geq 0}^{d}$ and…

Data Structures and Algorithms · Computer Science 2020-05-26 Tobias Christiani

We propose a kernelized classification layer for deep networks. Although conventional deep networks introduce an abundance of nonlinearity for representation (feature) learning, they almost universally use a linear classifier on the learned…

Machine Learning · Computer Science 2021-03-22 Sadeep Jayasumana , Srikumar Ramalingam , Sanjiv Kumar

Kernel-based clustering algorithms have the ability to capture the non-linear structure in real world data. Among various kernel-based clustering algorithms, kernel k-means has gained popularity due to its simple iterative nature and ease…

Computer Vision and Pattern Recognition · Computer Science 2014-02-18 Radha Chitta , Rong Jin , Timothy C. Havens , Anil K. Jain

Kernel $k$-means clustering is a powerful tool for unsupervised learning of non-linearly separable data. Since the earliest attempts, researchers have noted that such algorithms often become trapped by local minima arising from…

Machine Learning · Statistics 2020-11-13 Debolina Paul , Saptarshi Chakraborty , Swagatam Das , Jason Xu

Kernel approximation is widely used to scale up kernel SVM training and prediction. However, the memory and computation costs of kernel approximation models are still too high if we want to deploy them on memory-limited devices such as…

Machine Learning · Computer Science 2020-10-07 Zijian Lei , Liang Lan

The GMM (generalized min-max) kernel was recently proposed (Li, 2016) as a measure of data similarity and was demonstrated effective in machine learning tasks. In order to use the GMM kernel for large-scale datasets, the prior work resorted…

Machine Learning · Statistics 2016-07-13 Ping Li

Kernel-based methods enjoy powerful generalization capabilities in handling a variety of learning tasks. When such methods are provided with sufficient training data, broadly-applicable classes of nonlinear functions can be approximated…

Machine Learning · Statistics 2017-12-29 Fatemeh Sheikholeslami , Dimitris Berberidis , Georgios B. Giannakis

The Min-Hashing approach to sketching has become an important tool in data analysis, information retrial, and classification. To apply it to real-valued datasets, the ICWS algorithm has become a seminal approach that is widely used, and…

Machine Learning · Statistics 2018-10-24 Edward Raff , Jared Sylvester , Charles Nicholas
‹ Prev 1 2 3 10 Next ›