On Approximating String Selection Problems with Outliers
Abstract
Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove this problem has no PTAS unless ZPP=NP, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no EPTAS unless W[1]=FPT. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.
Keywords
Cite
@article{arxiv.1202.2820,
title = {On Approximating String Selection Problems with Outliers},
author = {Christina Boucher and Gad M. Landau and Avivit Levy and David Pritchard and Oren Weimann},
journal= {arXiv preprint arXiv:1202.2820},
year = {2012}
}