Date |
July 23, 2010 |
Speaker |
Professor Ernst-Walter Knapp, Institute of Chemistry & Biochemistry, Free University of Berlin, Germany |
Title |
SPARROW a Protein secondary structure predictor, variations on an old theme
|
Abstract |
Not so much is happening today dealing with this problem. In part
certainly due to the tremendous success of the neural network
‘PSIPRED’ from David Jones. The main innovation of PSIPRED was to use
sequence profiles for a polypeptide sequence rather than the sequence
directly. PSIPRED was first established in 1999 [Jones DT ‘Protein
secondary structure prediction based on position-specific scoring
matrices’ J. Mol. Biol. 292 (1999) 195-202]. It achieves now on
average an accuracy of 82% considering three different secondary
structure classes. This performance is difficult to reach or even to
supersede. Hence, many attempts by other researchers may have been
unknown, since they were not successful enough to publish their work.
Another reason, which may have prevented researchers to reconsider the
problem, could be the belief that one is already close to the limit of
what can be predicted solely based on protein sequences. Although
there is very little theoretical development visible the demand for
reliable protein secondary structure predictors by the researchers
working in structural biology (crystallographers and NMR
spectroscopists) has enormously increased. Hence, we considered it
worth wile to tackle the problem one more time.
I start with a survey of what has been done so far predicting protein
secondary structure. Then I will introduce our machine learning method
‘SPARROW’ that is based on scoring functions linear in parameter space
and quadratic is sequence or structure space, respectively. We used
two different approaches. The first approach considers the three class
problem (α-helix, β-strand, coil and else). It is based on a
three-fold hierarchy of three scalar scoring functions in each
hierarchy. Each scoring function discriminates between one class from
the other two. In the first hierarchy level the scoring functions use
sequence profiles as input, thus establishing sequence-structure
correlations. The scoring functions of the second level use as input
the results from scoring functions of the first level in a sequence
window thus describing structure-structure correlations. As a final
step we used a neural network similar as in the SPIPRED approach. The
second approach uses a vector valued scoring function, which can
tackle an arbitrary multiclass problem directly that we not only apply
for the three class problem, but also for the eight secondary
structure classes of the DSSP program from Kabsch and Sander. Again we
use three hierarchies as in the first approach. With these methods we
s succeeded to be as good as PSIPRED in predicting secondary
structures with the addition that the more general multi-class problem
is solved as well.
|
|