Construction and analysis of Escherichia coli genome database

Takeshi Itoh[1] (t-ito@bs.aist-nara.ac.jp)
Minoru Yano[1] (m-yano@bs.aist-nara.ac.jp)
Keiko Takemoto[2] (ktakemot@virus.kyoto-u.ac.jp)
Yutaka Akiyama[3] (akiyama@kuicr.kyoto-u.ac.jp)
Hirotada Mori[1] (hmori@gtc.aist-nara.ac.jp)

[1] Nara Institute of Science and Technology Ikoma 630-01, Japan
[2] Institute for Virus Research, Kyoto University Kyoto 606-01, Japan
[3] Institute for Chemical Research, Kyoto University Gokasho, Uji 611, Japan

Abstract

It is possible to elucidate whole genome structure by current technique. The genome projects of some species, C.elegans, Yeast, Escherichia coli, Bacillus subtilis, Arabidopsis, rice and human are now running. In Escherichia coli, two lines of large scale sequencing have emerged. One by the Wisconsin group in U.S.A. and the another by the collaborative research group in Japan. To make a non redundant sequence database is essential not only for effective promotion of sequencing project but for whole genome analysis and reference by biologists. We determine the sequences as one of the research group in Japan and make a non redundant DNA sequence database for effective promotion of genome project and analysis of genome structure. In Genome Workshop meeting 1993, we reported the construction of Escherichia coli genome database on Genomatica system. We update our E.coli genome database by incorporating of E.coli new entries of GenBank and from genome project research groups. The contiguous sequence data were then used to predict possible open reading frames. The translated amino acid sequences from these ORFs were subjected to homology analysis against the PIR and the SWISSPROT protein database. The whole sets of plausible ORF's were further classified by similarities between ORF's and those of gene organizations. It may be possible to detect rearrangements of chromosome through its own evolution by that analyses.