On Approximating String Selection Problems with Outliers

Christina Boucher; Gad M. Landau; Avivit Levy; David Pritchard; Oren Weimann

On Approximating String Selection Problems with Outliers

Data Structures and Algorithms 2012-02-14 v1

Authors: Christina Boucher , Gad M. Landau , Avivit Levy , David Pritchard , Oren Weimann

Abstract

Many problems in bioinformatics are about finding strings that approximately represent a collection of given strings. We look at more general problems where some input strings can be classified as outliers. The Close to Most Strings problem is, given a set S of same-length strings, and a parameter d, find a string x that maximizes the number of "non-outliers" within Hamming distance d of x. We prove this problem has no PTAS unless ZPP=NP, correcting a decade-old mistake. The Most Strings with Few Bad Columns problem is to find a maximum-size subset of input strings so that the number of non-identical positions is at most k; we show it has no PTAS unless P=NP. We also observe Closest to k Strings has no EPTAS unless W[1]=FPT. In sum, outliers help model problems associated with using biological data, but we show the problem of finding an approximate solution is computationally difficult.

Keywords

string algorithms approximation algorithm

Cite

@article{arxiv.1202.2820,
  title  = {On Approximating String Selection Problems with Outliers},
  author = {Christina Boucher and Gad M. Landau and Avivit Levy and David Pritchard and Oren Weimann},
  journal= {arXiv preprint arXiv:1202.2820},
  year   = {2012}
}

On Approximating String Selection Problems with Outliers

Abstract

Keywords

Cite

Related papers