Evaluation of exon prediction tools using a long DNA sequence data

Katsuhiko Murakami (katsu@ims.u-tokyo.ac.jp)[1][2]
Shiho Tsukuni[1]
Toshihisa Takagi (takagi @ims.u-tokyo.ac.jp)[1]
Masahira Hattori[1]

[1] Human Genome Center,
Institute of Medical Science, University of Tokyo
[2] Central Research Laboratory, Hitachi Ltd.


Abstract

we have evaluated the ability to locate coding regions of two exon prediction software, GRAIL and FEX, using a long (about 301k bases) genomic DNA sequence. We performed an experiment to check the correctness of the exon condidates with high scores. FEX was more sensitive but less specific than GRAIL. The numbers of the exons predicated by both tools were much less than our simple estimation from the sequence length. To reduce more unreliable candidates, we proposed guidelines for users. If one uses the guidelines, both tools would be more practical even for DNA sequences longer than 100,000 bases.