Knowledge Acquisition System BONSAI Garden

Satoru Miyano

Research Institute of Fundamental Information Science
Kyushu University 33, Fukuoka 812, Japan
1-5-1, Chofugaoka, Chofu, Tokyo 182, Japan
e-mail: miyano@rifis.kyushu-u.ac.jp

For knowledge discovery from amino acid sequences of proteins, we have studied a learning model and related algorithmic techniques Miyano93 and have designed a machine discovery system BONSAI [1,2,4] (FIg. 1}). When positive and negative examples of sequences are given as input data, BONSAI HICSS26 produces a pair of a decision tree over regular patterns and an alphabet indexing as a hypothesis which shall represents knowledge about the data. The name of ``BONSAI'' comes from the fact that the knowledge (the nature) is expressed as a small tree (a decision tree over regular patterns) in harmony with an alphabet indexing (a pot). This system has succeeded in discovering reasonable knowledge on transmembrane domain sequences and signal peptide sequences. Through these experimental results together with theoretical foundations, we have recognized that the potential ability of BONSAI is very high.

On the other hand, when several kinds of sequences are mixed in the data, i.e., a hodgepodge of sequences, it is not reasonable to explain the data by a single hypothesis produced by BONSAI. Coping with such situation, we have designed a system BONSAI Garden that runs several BONSAI's in parallel. The target of BONSAI Garden is the following:

BONSAI Garden consists of a coordinator called a Gardener and some number of BONSAI's. The Gardener and BONSAI's run over a network in parallel for classification of data and knowledge discovery. By specifying the task of Gardener, we can design various BONSAI Gardens. BONSAI Garden is realized on a network of workstations. This talk presents the design concept developed for BONSAI Garden together with the background theory and ideas in BONSAI.

Although no mathematical theorems are provided for BONSAI Garden, experimental results show an interesting nature of BONSAI Garden and we believe that it would be one of the prototypes of knowledge acquisition systems for molecular biology.

References

[1]Arikawa, S., Kuhara, S., Miyano, S., Mukouchi, Y., Shinohara, A., and Shinohara, T., A machine discovery of a negative motif from amino acid sequences by decision trees over regular patterns, New Generation Computing 11 (1993) 361--375.

[2]Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A., and Shinohara, T., A learning algorithm for elementary formal systems and its experiments on identification of transmembrane domains, in Proc. 25th Hawaii International Conference on System Sciences, (1992) 675--684.

[3]Miyano, S., Learning theory toward Genome Informatics, in Proc. 4th Workshop on Algorithmic Learning Theory (Lecture Notes in Artificial Intelligence 744) (1993) 19--36.

[4]Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., and Arikawa, S., Knowledge acquisition from amino acid sequences by machine learning system BONSAI, Trans. Information Processing Society of Japan, 35 (1994) 2009--2018.