Finding Minimal Multiple Generalization over
Regular Patterns with Alphabet Indexing

Michiyo Yamaguchi [1] (michiyo@donald.ai.kyutech.ac.jp)
Shinichi Shimozono [2] (sin@ces.kyutech.ac.jp)
Takeshi Shinohara [1] (shino@donald.ai.kyutech.ac.jp)

[1] Department of Artificial Intelligence
[2] Department of Control Engineering and Science
Kyushu Institute of Technology
Kawazu 680-4, Iizuka 820, Japan


Abstract

We propose a learning algorithm that discovers a motif represented by patterns and an alphabet indexing from biosequences. From only positive examples with the help of an alphabet indexing, the algorithm finds k regular patterns as a k-minimal multiple generalization (k-mmg for short). The computational results for transmembrane domains indicate that the combination of k-mmg and alphabet indexing works quite successful. We also introduce a partial alphabet indexing that transforms symbols dependently on the position in sequences.