Automatic Procedure to Extract Signature Pentapeptides from the Protein Sequence Database

Ikuo Uchiyama (uchiyama.kuicr.kyoto-u.ac.jp)
Atsushi Ogiwara
Zenmei Ohkubo
Minoru Kanehisa

Institute for Chemical Research, Kyoto University,
Uji, Kyoto 611, Japan


Abstract

A method is described for extracting signature pentapeptides that are conserved and exclusively found in a group of homologous proteins. The BLAST algorithm is used to count the frequency of occurrences of pentapeptide patterns allowing limited substitutions, as well as to perform homology search. For those pentapeptides that appear in a given sequence we examine the frequency of occurrences of these pentapeptides and related ones in homologous sequences which are ordered according to the homology score. By comparing against the frequency in the entire database, we can extract uniquely conserved pentapeptides and at the same time perform a grouping of homologous sequences. Thus, our procedure can automatically identify, if any, pentapeptides that are strongly tied with the group. Possibility of using our pentapeptide word dictionary to infer protein function is discussed.