Are the Hidden Markov Models Promising in Protein Research ?

Kiyoshi Asai[1] (asai@etl.go.jp)
Hidetoshi Tanaka[2] (htanaka@icot.or.jp)
Katunobu Itou[1] (kito@etl.go.jp)
Kentaro Onizuka[2] (onizuka@icot.or.jp)

[1]Electrotechnical Laboratory (ETL)
Umezono, Tsukuba, Japan 305
[2]Institute for New Generation Computer Technology (ICOT)
1-4-28 Mita, Minato-ku, Tokyo, Japan 108

Abstract

Hidden Markov Model (HMM), a type of stochastic model (signal source), is now becoming popular in molecular biology. HMMs consist of 'hidden' states, statetransition probabilities and output distributions. Because there are known algorithms to train the HMMs as stochastic representations of the training data, they are widely used for pattern recognition, especially for speech recognition.

In the field of protein research, HMMs have been used to represent stochastic motifs of protein sequences, to model the structural patterns of protein, to predict the secondary structures and upper level structures, to make multiple sequence alignments, and to classify the protein sequences.

In each case, HMM techniques are closely related to the conventional methods. An important merit for using HMMs is their flexibility as a model of protein sequences. The serious problem of HMMs is that they need a large number of training data. In this paper, we give a brief introduction to HMMs, review HMM-related protein research, compare these research with the other methods and discuss the usefulness and further possibilities of HMMs.