A framework for space-efficient string kernels
Abstract
String kernels are typically used to compare genome-scale sequences whose length makes alignment impractical, yet their computation is based on data structures that are either space-inefficient, or incur large slowdowns. We show that a number of exact string kernels, like the -mer kernel, the substrings kernels, a number of length-weighted kernels, the minimal absent words kernel, and kernels with Markovian corrections, can all be computed in time and in bits of space in addition to the input, using just a data structure on the Burrows-Wheeler transform of the input strings, which takes time per element in its output. The same bounds hold for a number of measures of compositional complexity based on multiple value of , like the -mer profile and the -th order empirical entropy, and for calibrating the value of using the data.
Cite
@article{arxiv.1502.06370,
title = {A framework for space-efficient string kernels},
author = {Djamal Belazzougui and Fabio Cunial},
journal= {arXiv preprint arXiv:1502.06370},
year = {2015}
}