Aki HASEGAWA (haseg-ak@gold.cs.uec.ac.jp)
Yasuo UEMURA (uemura-y@gold.cs.uec.ac.jp)
Satoshi KOBAYASHI (satoshi@cs.uec.ac.jp)
Takashi YOKOMORI (yokomori@cs.uec.ac.jp)
Department of Computer Science and Information Mathematics,
University of Electro-Communications
1-5-1 Chofugaoka, Chofu, Tokyo 182, Japan
It is of great significance to develop an efficient software system
for higher-level structural prediction in RNA/protein sequences.
Speaking of RNA secondary structure prediction, it is inevitably
required that a prediction system must have an ability to deal with
so-called "pseudoknot" structures, one of the most typical and
important constructs found in vivo, while no effective system is yet
reported for predicting RNA secondary structures involving in pseudoknots.
We are developing prediction systems for RNA secondary structures
that can handle pseudo-knots in an elegant manner, where the developing
systems are constructed based on the following two ways.
Prediction System Using Tree Grammars :
In the previous work UHKY:95, we developed a parsing algorithm for RNA
modeling grammars, and applied it to some computational problems concerning
RNA secondary structures.
In this work, we prototype a prediction system for RNA secondary structures
which is based on minimizing the free-energy change associated with base pairings and loop structures. One of the major advantages of this system is its capability of
predicting structures which include pseudoknots.
We evaluate the system by presenting how long sequences it can process and
how long and how much memory it takes to find the solution.
We also make comparisons between biologically observed structures and
prediction results.
Learning Secondary Structures Using Control Sets :
We propose a learning system that acquires the common RNA secondary structures
of the given set of RNA sequences. In order to capture the common biological features of RNA secondary structures, it is possible to make use of derivational information obtained from parsing analysis by the prediction system described above.
More specifically, the key idea of our method is outlined as follows:
Initially we set up the most general tree grammar G0,=,(C,A,{S}),
where every nonterminal node in each tree of C or A is labeled by exactly
one symbol S, and A contains all types of uniquely labeled adjunct trees.
Thus G0 has no restriction on which adjunct trees to use at any
configuration within the derivation. Then, given RNA sequences are parsed by the prediction system (parser), to get the control words (i.e., words over the alphabet of labels of all trees). Our learning system for RNA secondary structures consists of the
following procedures:
1.Set up the most general tree grammar
2 Parse the given sequence with G0 using the parser mentioned above,
3 Construct a set of control words,
4.Learn the control set.
In this manner, given a set of RNA sequences, the learning system produces a control set with which one can construct a prediction system together
with G0. In other words, one may have another type of prediction system using control set.