Related papers: Efficient Geometric-based Computation of the Strin…

A framework for space-efficient string kernels

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a…

Data Structures and Algorithms · Computer Science 2015-02-24 Djamal Belazzougui , Fabio Cunial

Efficient Global String Kernel with Random Features: Beyond Counting Substructures

Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the…

Machine Learning · Computer Science 2019-12-02 Lingfei Wu , Ian En-Hsu Yen , Siyu Huo , Liang Zhao , Kun Xu , Liang Ma , Shouling Ji , Charu Aggarwal

GaKCo: a Fast GApped k-mer string Kernel using COunting

String Kernel (SK) techniques, especially those using gapped $k$-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we…

Machine Learning · Computer Science 2017-09-19 Ritambhara Singh , Arshdeep Sekhon , Kamran Kowsari , Jack Lanchantin , Beilun Wang , Yanjun Qi

Space-efficient Feature Maps for String Alignment Kernels

String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications.…

Machine Learning · Computer Science 2019-11-15 Yasuo Tabei , Yoshihiro Yamanishi , Rasmus Pagh

Parallel Tree Kernel Computation

Tree kernels are fundamental tools that have been leveraged in many applications, particularly those based on machine learning for Natural Language Processing tasks. In this paper, we devise a parallel implementation of the sequential…

Computation and Language · Computer Science 2023-05-16 Souad Taouti , Hadda Cherroun , Djelloul Ziadi

A Geometric Algorithm for Scalable Multiple Kernel Learning

We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex…

Machine Learning · Computer Science 2014-03-18 John Moeller , Parasaran Raman , Avishek Saha , Suresh Venkatasubramanian

Sparse Multiple Kernel Learning with Geometric Convergence Rate

In this paper, we study the problem of sparse multiple kernel learning (MKL), where the goal is to efficiently learn a combination of a fixed small number of kernels from a large pool that could lead to a kernel classifier with a small…

Machine Learning · Computer Science 2013-02-05 Rong Jin , Tianbao Yang , Mehrdad Mahdavi

Tensor Sketch: Fast and Scalable Polynomial Kernel Approximation

Approximation of non-linear kernels using random feature maps has become a powerful technique for scaling kernel methods to large datasets. We propose $\textit{Tensor Sketch}$, an efficient random feature map for approximating polynomial…

Data Structures and Algorithms · Computer Science 2025-05-20 Ninh Pham , Rasmus Pagh

Kernel Density Estimation through Density Constrained Near Neighbor Search

In this paper we revisit the kernel density estimation problem: given a kernel $K(x, y)$ and a dataset of $n$ points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query $q$, a…

Data Structures and Algorithms · Computer Science 2020-11-16 Moses Charikar , Michael Kapralov , Navid Nouri , Paris Siminelakis

Efficient Approximation Algorithms for String Kernel Based Sequence Classification

Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the…

Data Structures and Algorithms · Computer Science 2017-12-13 Muhammad Farhan , Juvaria Tariq , Arif Zaman , Mudassir Shabbir , Imdad Ullah Khan

Linear Time Kernel Matrix Approximation via Hyperspherical Harmonics

We propose a new technique for constructing low-rank approximations of matrices that arise in kernel methods for machine learning. Our approach pairs a novel automatically constructed analytic expansion of the underlying kernel function…

Machine Learning · Computer Science 2022-02-09 John Paul Ryan , Anil Damle

Kernel Clustering with Sigmoid-based Regularization for Efficient Segmentation of Sequential Data

Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A…

Machine Learning · Computer Science 2022-06-23 Tung Doan , Atsuhiro Takasu

Scalable Signature Kernel Computations for Long Time Series via Local Neumann Series Expansions

The signature kernel is a recent state-of-the-art tool for analyzing high-dimensional sequential data, valued for its theoretical guarantees and strong empirical performance. In this paper, we present a novel method for efficiently…

Numerical Analysis · Mathematics 2025-11-12 Matthew Tamayo-Rios , Alexander Schell , Rima Alaifari

Distributed Adaptive Sampling for Kernel Matrix Approximation

Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires at least…

Machine Learning · Statistics 2018-03-28 Daniele Calandriello , Alessandro Lazaric , Michal Valko

Kernel Regression with Sparse Metric Learning

Kernel regression is a popular non-parametric fitting technique. It aims at learning a function which estimates the targets for test inputs as precise as possible. Generally, the function value for a test input is estimated by a weighted…

Machine Learning · Computer Science 2017-12-27 Rongqing Huang , Shiliang Sun

Large-scale Kernel-based Feature Extraction via Budgeted Nonlinear Subspace Tracking

Kernel-based methods enjoy powerful generalization capabilities in handling a variety of learning tasks. When such methods are provided with sufficient training data, broadly-applicable classes of nonlinear functions can be approximated…

Machine Learning · Statistics 2017-12-29 Fatemeh Sheikholeslami , Dimitris Berberidis , Georgios B. Giannakis

Fast Computation of Subpath Kernel for Trees

The kernel method is a potential approach to analyzing structured data such as sequences, trees, and graphs; however, unordered trees have not been investigated extensively. Kimura et al. (2011) proposed a kernel function for unordered…

Data Structures and Algorithms · Computer Science 2012-06-22 Daisuke Kimura , Hisashi Kashima

Hierarchically Compositional Kernels for Scalable Nonparametric Learning

We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where…

Machine Learning · Computer Science 2017-08-15 Jie Chen , Haim Avron , Vikas Sindhwani

Randomized kernels for large scale Earth observation applications

Dealing with land cover classification of the new image sources has also turned to be a complex problem requiring large amount of memory and processing time. In order to cope with these problems, statistical learning has greatly helped in…

Machine Learning · Computer Science 2020-12-08 Adrián Pérez-Suay , Julia Amorós-López , Luis Gómez-Chova , Valero Laparra , Jordi Muñoz-Marí , Gustau Camps-Valls

$LCSk$++: Practical similarity metric for long strings

In this paper we present $LCSk$++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants…

Data Structures and Algorithms · Computer Science 2019-08-27 Filip Pavetić , Goran Žužić , Mile Šikić