Systematization of Species-Specific Diversity of Genes in Codon Usage: Comparison of the Diversity Among Bacteria and Prediction of the Protein Production Levels in Cells

Shigehiko Kanaya [1] (kanaya@eie.yz.yamagata-u.ac.jp)
Yoshihiro Kudo [1] (ykudo@eie.yz.yamagata-u.ac.jp)
Shinya Suzuki [1] (a93619@eie.yz.yamagata-u.ac.jp)
Toshimichi Ikemura [2] (tikemura@ddbj.nig.ac.jp)

[1] Department of Electric and information Engineering, Faculty of Engineering, Yamagata University,
Yonezawa, Yamagata-ken 992, Japan
[2] Department of Evolutionary Genetics, National Institute of Genetics,
and the Graduate University for Advanced Studies,
Mishima, Shizuoka-ken 411, Japan


Abstract

In the present study, we have developed the procedure for estimating species-specific heterogeneous codon usage among intraspecific genes called diversity in codon usage and for systematizing species by the species-specific diversity on the basis of principal component analysis. We tried to quantify differences of the diversity among five species, Escherichia coli (Ec), Salmonella typhimurium (St), Haemophilus influenzae (Hi), Bacillus subtilis (Bs), and Synechocystis sp. (Ss). In the five species, many of genes involved in the translation process and energy metabolism had positive values (Z1 > 0) on the first principal component (PC1). In Ss, many of genes involved in photosynthetic system had also postive Z1-values. These genes are thought to be highly expressed. By the direction of PC1, the five species were roughly classified into three categories, [Ec, St, Hi], [Ss], [Bs]. The dendrogram constructed was roughly consistent with the rRNA-based phylogeny, but interesting differences were also observed between the two phylogenic trees.