English
Related papers

Related papers: Efficient Geometric-based Computation of the Strin…

200 papers

String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a…

Data Structures and Algorithms · Computer Science 2015-02-24 Djamal Belazzougui , Fabio Cunial

Analysis of large-scale sequential data has been one of the most crucial tasks in areas such as bioinformatics, text, and audio mining. Existing string kernels, however, either (i) rely on local features of short substructures in the…

Machine Learning · Computer Science 2019-12-02 Lingfei Wu , Ian En-Hsu Yen , Siyu Huo , Liang Zhao , Kun Xu , Liang Ma , Shouling Ji , Charu Aggarwal

String Kernel (SK) techniques, especially those using gapped $k$-mers as features (gk), have obtained great success in classifying sequences like DNA, protein, and text. However, the state-of-the-art gk-SK runs extremely slow when we…

Machine Learning · Computer Science 2017-09-19 Ritambhara Singh , Arshdeep Sekhon , Kamran Kowsari , Jack Lanchantin , Beilun Wang , Yanjun Qi

String kernels are attractive data analysis tools for analyzing string data. Among them, alignment kernels are known for their high prediction accuracies in string classifications when tested in combination with SVM in various applications.…

Machine Learning · Computer Science 2019-11-15 Yasuo Tabei , Yoshihiro Yamanishi , Rasmus Pagh

Tree kernels are fundamental tools that have been leveraged in many applications, particularly those based on machine learning for Natural Language Processing tasks. In this paper, we devise a parallel implementation of the sequential…

Computation and Language · Computer Science 2023-05-16 Souad Taouti , Hadda Cherroun , Djelloul Ziadi

We present a geometric formulation of the Multiple Kernel Learning (MKL) problem. To do so, we reinterpret the problem of learning kernel weights as searching for a kernel that maximizes the minimum (kernel) distance between two convex…

Machine Learning · Computer Science 2014-03-18 John Moeller , Parasaran Raman , Avishek Saha , Suresh Venkatasubramanian

In this paper, we study the problem of sparse multiple kernel learning (MKL), where the goal is to efficiently learn a combination of a fixed small number of kernels from a large pool that could lead to a kernel classifier with a small…

Machine Learning · Computer Science 2013-02-05 Rong Jin , Tianbao Yang , Mehrdad Mahdavi

Approximation of non-linear kernels using random feature maps has become a powerful technique for scaling kernel methods to large datasets. We propose $\textit{Tensor Sketch}$, an efficient random feature map for approximating polynomial…

Data Structures and Algorithms · Computer Science 2025-05-20 Ninh Pham , Rasmus Pagh

In this paper we revisit the kernel density estimation problem: given a kernel $K(x, y)$ and a dataset of $n$ points in high dimensional Euclidean space, prepare a data structure that can quickly output, given a query $q$, a…

Data Structures and Algorithms · Computer Science 2020-11-16 Moses Charikar , Michael Kapralov , Navid Nouri , Paris Siminelakis

Sequence classification algorithms, such as SVM, require a definition of distance (similarity) measure between two sequences. A commonly used notion of similarity is the number of matches between $k$-mers ($k$-length subsequences) in the…

Data Structures and Algorithms · Computer Science 2017-12-13 Muhammad Farhan , Juvaria Tariq , Arif Zaman , Mudassir Shabbir , Imdad Ullah Khan

We propose a new technique for constructing low-rank approximations of matrices that arise in kernel methods for machine learning. Our approach pairs a novel automatically constructed analytic expansion of the underlying kernel function…

Machine Learning · Computer Science 2022-02-09 John Paul Ryan , Anil Damle

Kernel segmentation aims at partitioning a data sequence into several non-overlapping segments that may have nonlinear and complex structures. In general, it is formulated as a discrete optimization problem with combinatorial constraints. A…

Machine Learning · Computer Science 2022-06-23 Tung Doan , Atsuhiro Takasu

The signature kernel is a recent state-of-the-art tool for analyzing high-dimensional sequential data, valued for its theoretical guarantees and strong empirical performance. In this paper, we present a novel method for efficiently…

Numerical Analysis · Mathematics 2025-11-12 Matthew Tamayo-Rios , Alexander Schell , Rima Alaifari

Most kernel-based methods, such as kernel or Gaussian process regression, kernel PCA, ICA, or $k$-means clustering, do not scale to large datasets, because constructing and storing the kernel matrix $\mathbf{K}_n$ requires at least…

Machine Learning · Statistics 2018-03-28 Daniele Calandriello , Alessandro Lazaric , Michal Valko

Kernel regression is a popular non-parametric fitting technique. It aims at learning a function which estimates the targets for test inputs as precise as possible. Generally, the function value for a test input is estimated by a weighted…

Machine Learning · Computer Science 2017-12-27 Rongqing Huang , Shiliang Sun

Kernel-based methods enjoy powerful generalization capabilities in handling a variety of learning tasks. When such methods are provided with sufficient training data, broadly-applicable classes of nonlinear functions can be approximated…

Machine Learning · Statistics 2017-12-29 Fatemeh Sheikholeslami , Dimitris Berberidis , Georgios B. Giannakis

The kernel method is a potential approach to analyzing structured data such as sequences, trees, and graphs; however, unordered trees have not been investigated extensively. Kimura et al. (2011) proposed a kernel function for unordered…

Data Structures and Algorithms · Computer Science 2012-06-22 Daisuke Kimura , Hisashi Kashima

We propose a novel class of kernels to alleviate the high computational cost of large-scale nonparametric learning with kernel methods. The proposed kernel is defined based on a hierarchical partitioning of the underlying data domain, where…

Machine Learning · Computer Science 2017-08-15 Jie Chen , Haim Avron , Vikas Sindhwani

Dealing with land cover classification of the new image sources has also turned to be a complex problem requiring large amount of memory and processing time. In order to cope with these problems, statistical learning has greatly helped in…

In this paper we present $LCSk$++: a new metric for measuring the similarity of long strings, and provide an algorithm for its efficient computation. With ever increasing size of strings occuring in practice, e.g. large genomes of plants…

Data Structures and Algorithms · Computer Science 2019-08-27 Filip Pavetić , Goran Žužić , Mile Šikić
‹ Prev 1 2 3 10 Next ›