Toshio Shimizu (firstname.lastname@example.org)
Kenta Nakai (email@example.com)
 Faculty of Science, Hirosaki University
3 Bunkyo-cho, Hirosaki 036 Japan
 National Institute for Basic Biology
Myodaiji, Okazaki 444 Japan
How reliable and useful are predictions of transmembrane segments(TMSs)
of membrane proteins from the amino acid sequences?
It remains still under debate.
Kyte and Doolittle proposed a simple scheme for the prediction of TMSs
It is based on the hydropathy plot and is widely accepted as a basic and
Since then, a large number of more sophisticated
predictive algorithms have been proposed, which are improved varieties
of the Kyte-Doolittle's approach.
Although these methods have been considered to give rather good results,
their abilities are still not enough to predict the number and positions of
TMSs precisely; they often give totally different predictive results with
proteins having many TMSs, in particular
One reason for this situation can be attributable to the low quality of
the information on TMSs described in general amino acid sequence databases.
The information included within the SWISS-PROT database, for example, is
mostly not based on any experimental evidence but on predicted models;
there is often no explicit description about whether the data comes from
experiments or calculations in databases. Higher quality of information
on TMSs from experimental evidence only is essential to evaluate existing
prediction methods more precisely and to develop an algorithm overcoming
We have collected 128 references reporting the membrane topology of proteins, and are continuing our efforts to triple this number. From them, we selected 54 topology models based on experimental evidence, at least partially. Combining these data with the sequence information from the SWISS-PROT database, we have constructed a membrane protein database in the form of relational database. Current version includes 54 proteins which are classified into 3 groups (eukaryotic proteins, prokaryotic proteins, and the proteins with non-helical segments) as shown in Figure 1. Using this database we evaluated the predictability of the algorithms of following authors: Eisenberg ; Klein, Kanehisa and DeLisi(KKD method) ; von Heijne(TopPred method) ; and Persson and Argos . The KKD method and the TopPred method predicted the exact number of TMSs for 59% and 67% of proteins in our database, respectively. These values could be increased to 63% and 74% by optimizing respective parameter values. The KKD method tends to predict fewer number of TMSs than the correct number, while the TopPred method shows the opposite tendency. We are now testing our previous idea to use different cut-off parameters for one TMS proteins and multiple TMS proteins in the KKD method and are also trying to develop a new predictive algorithm, by taking more precise position-dependent information on TMS into account.