Knowledge Acquisition System BONSAI Garden
Satoru Miyano
Research Institute of Fundamental Information Science
Kyushu University 33, Fukuoka 812, Japan
1-5-1, Chofugaoka, Chofu, Tokyo 182, Japan
e-mail: miyano@rifis.kyushu-u.ac.jp
For knowledge discovery from amino acid sequences of proteins,
we have studied a learning model and related algorithmic
techniques Miyano93
and have designed a machine discovery system BONSAI [1,2,4] (FIg. 1}).
When positive and negative examples of sequences are given as input data,
BONSAI HICSS26 produces a pair of a decision tree over regular
patterns
and an alphabet indexing
as a hypothesis which shall represents knowledge about
the data.
The name of ``BONSAI'' comes from the fact that the knowledge (the nature)
is expressed as a small tree (a decision tree over regular patterns)
in harmony with an alphabet indexing (a pot).
This system has succeeded in discovering
reasonable knowledge on transmembrane domain sequences and
signal peptide sequences.
Through these experimental results together with theoretical foundations,
we have recognized that the potential ability of BONSAI is very high.
On the other hand, when several kinds of sequences are mixed in the data,
i.e., a hodgepodge of sequences,
it is not reasonable to explain the data by a single hypothesis produced
by BONSAI. Coping with such situation,
we have designed a system BONSAI Garden that runs several
BONSAI's in parallel. The target of BONSAI Garden is the
following:
-
The system should be able to handle a hodgepodge of data and/or noisy
data so that it classifies the data into some number of classes
of sequences and simultaneously finds for each of these classes a
hypothesis
which explains the sequences in the class.
BONSAI Garden consists of a coordinator called a Gardener and
some number of BONSAI's. The Gardener and BONSAI's
run over a network in parallel for classification of data and knowledge
discovery.
By specifying the task of Gardener, we can design various BONSAI
Gardens. BONSAI Garden is realized on a network of workstations.
This talk presents the design concept developed
for BONSAI Garden together with the background theory and ideas in BONSAI.
Although no mathematical theorems are provided for BONSAI Garden,
experimental results show an interesting nature of BONSAI Garden
and we believe that it would be one of the prototypes of
knowledge acquisition systems for molecular biology.
References
[1]Arikawa, S., Kuhara, S., Miyano, S., Mukouchi, Y., Shinohara, A., and
Shinohara, T.,
A machine discovery of a negative motif from amino acid sequences by
decision
trees over regular patterns,
New Generation Computing 11 (1993) 361--375.
[2]Arikawa, S., Kuhara, S., Miyano, S., Shinohara, A., and Shinohara, T.,
A learning algorithm for elementary formal systems and its experiments on
identification of transmembrane domains,
in Proc. 25th Hawaii International Conference on System Sciences,
(1992) 675--684.
[3]Miyano, S.,
Learning theory toward Genome Informatics,
in Proc. 4th Workshop on Algorithmic Learning Theory (Lecture Notes
in
Artificial Intelligence 744) (1993) 19--36.
[4]Shimozono, S., Shinohara, A., Shinohara, T., Miyano, S., Kuhara, S., and
Arikawa, S.,
Knowledge acquisition from amino acid sequences by machine learning system
BONSAI,
Trans. Information Processing Society of Japan,
35 (1994) 2009--2018.