English

Finding Sequence Features in Tissue-specific Sequences

Genomics 2007-05-23 v1

Abstract

The discovery of motifs underlying gene expression is a challenging one. Some of these motifs are known transcription factors, but sequence inspection often provides valuable clues, even discovery of novel motifs with uncharacterized function in gene expression. Coupled with the complexity underlying tissue-specific gene expression, there are several motifs that are putatively responsible for expression in a certain cell type. This has important implications in understanding fundamental biological processes, such as development and disease progression. In this work, we present an approach to the principled selection of motifs (not necessarily transcription factor sites) and examine its application to several questions in current bioinformatics research. There are two main contributions of this work: Firstly, we introduce a new metric for variable selection during classification, and secondly, we investigate a problem of finding specific sequence motifs that underlie tissue specific gene expression. In conjunction with the SVM classifier we find these motifs and discover several novel motifs which have not yet been attributed with any particular functional role (eg: TFBS binding motifs). We hypothesize that the discovery of these motifs would enable the large-scale investigation for the tissue specific regulatory potential of any conserved sequence element identified from genome-wide studies. Finally, we propose the utility of this developed framework to not only aid discovery of discriminatory motifs, but also to examine the role of any motif of choice in co-regulation or co-expression of gene groups.

Keywords

Cite

@article{arxiv.q-bio/0702022,
  title  = {Finding Sequence Features in Tissue-specific Sequences},
  author = {Arvind Rao and Alfred O. Hero and David J. States and James Douglas Engel},
  journal= {arXiv preprint arXiv:q-bio/0702022},
  year   = {2007}
}

Comments

11 pages,9 figures