Integrated Characterization of Transcriptional Upstream Sequences by Statistical Consensus Patterns and Experimental Data

Wataru Fujibuchi
Minoru Kanehisa

Institute for Chemical Research, Kyoto University


Abstract

Transcription is one of the key phenomena for gene expression, and it is important to clear the mechanism of transcriptional control on DNA sequences. But there are so many concerning parameters which should be taken into account to clarify the mechanism that it is difficult to handle them at once with experimental method. To overcome this problem, we have been developing a computer system which deals with much information for transcription.

We first collected promoter sequences from EMBL release 31 database, and aligned them with the transcriptional initiation site. To extract consensus patterns with locational constraints, we calculated appearance frequencies for all patterns on each window position of all sequences. Allowing base substitution, we put similar patterns together and filtered them with threshold to construct a consensus pattern index.
In addition to consensus information, we used TFD (Transcription Factor Database) and EPD (Eukaryotic Promoter Database) as experimental data for identifying binding sites of transcription factors and the type of RNA polymerase, respectively.
With these statistical and experimental data, we retrieved promoter sequences and represented them as a set of significant information. The representation is not only useful for suggesting a lot of considerable parameters but will be possible to search similar promoter by functional aspect.