Comparative Analysis of Amino Acid Sequences based on Rough Sets and Domain Knowledge Hierarchy

S. Tsumoto[1] (tsumoto@tmd.ac.jp)
H. Tanaka[1] (tanaka@tmd.ac.jp)
K. Tsumoto[2]
I. Kumagai[2]

[1] Department of Information Medicine, Medical Research Institute,
Tokyo Medical and Dental University,
1-5-45 Yushima, Bunkyo-ku, Tokyo 113 Japan
[2] Department of Industrial Chemistry, Faculty of Engineering,
The University of Tokyo,
7-3-1 Hongo, Bunkyo-ku Tokyo 113 Japan

Abstract

Protein structure analysis from DNA sequences is an important and fast growing area in both computer science and biochemistry. Although interesting approaches have been studied, it is very difficult to capture the characteristics of protein, since even a simple protein have a complex combinatorial structure, which makes biochemical experiments very difficult to detect functional components. For this reason, almost all the problems about this field are left unsolved and it is very important to develop a system which assists researchers on molecular biology to remove the difficulties by a combinatorial explosion. In this paper, we propose a system based on combination of a probabilistic rule induction method with domain knowledge, which we call MOLA-MOLA (Molecular biological data-analyzer and Molecular biological knowledge acquisition tool) in order to retrieve the hassles from the experimental environments of molecular biologists. We apply this method to comparative analysis of lysozyme and alpha-lactalbumin, and the results show that we get some interesting results from amino-acid sequences, which has not been reported before.