Sequence Analysis of Human Genome : Prediction of Protein Coding Region in DNA Sequences

Kotoko Nakata[1]
Toru Yoshino[2,3]
Yasushi Kubota[2]
Akinori Sarai[3]

[1]Division of Chem-Bio Informatics
National Institute of Hygienic Sciences
[2]NOVA, Inc., Biosystem Laboratory
[3]RIKEN, Tsukuba Life Science Center

Abstract

As the international Human Genome projects are now being undertaken, numerous nucleic acid sequences are daily determined. The subsequent problems are the prediction of the locations of the protein coding-regions and the specific functional regions. A general method based on the statistical technique of discriminant analysis was previously developed [1], using the neural network theory and the analysis of peculiar features. The method was applied for the predictions of splice junction in mRNA sequences, and the functional regions in DNA and amino acid sequences. We improved the prediction reliability by using the updated database, considering much more features in the sequences, and including the multi-layer neural network with back-propagation training algorithm. We developed a support system for this program on X-windows, SUN workstation. The results of the ongoing analysis are easily visualized and the discriminant variables are effectively selected. The prediction reliability is higher than the previous method and the probability being in coding region for a sequence is shown for each algorithm.