Hidden Markov Model to Extract Leucine Zipper Motif

Yukiko Fujiwara[1] (yukiko@csl.cl.nec.co.jp)
Minoru Asogawa[2] (asogawa@csl.cl.nec.co.jp)
Akihiko Konagaya[2] (konagaya@csl.cl.nec.co.jp)

[1] C&C Research Laboratories, NEC Corporation
4-1-1, Miyazaki, Miyamaeku, Kawasaki, Kanagawa 216, Japan
[2] Massively Parallel Systems NEC Laboratory, RWCP (Real World Computing Partnership)
c/o C&C Research Laboratories, NEC Corporation
4-1-1, Miyazaki, Miyamaeku, Kawasaki, Kanagawa 216, Japan


Abstract

To represent motifs of amino acid sequences using Hidden Markov Models (HMMs) with high accuracy, the HMM topology must be specified according to the motif's characterisitcs. For this purpose, the "iterative duplication method", which learns the optimal HMM topology, was developed. In this method, a small fully-connected HMM was gradually expanded by a state splitting and a transition deleting. However, the method did not clearly determine a splitting state, because it randomly selected one of the mostly connected states. To determine a splitting state, we improve the iterative duplication method. The improved method selects the most ambiguous state for splitting. Since this ambiguity relies on the transition probabilities and observation distributions, the splitting state can be determined. Additionally, the improved method considers negligible state deletion. In an experiment, an HMM is obtained for a leucine zipper motif using this improved method. The prediction accuracy of this HMM is 96.48 percent. It is Compared with that of the HMM obtained by the previous method and the fully-connected HMM estimation method. The accuracy of the previous method was 95.85 percent and that of the fully-connected HMM was 95.22 percent.