A large-scale GenBank search of Expressed Sequence Tags using rapid identity searching program for DNA sequences

T. Nishikawa (nisikawa@crl.hitachi.co.jp)
K. Nagai (k-nagai@crl.hitachi.co.jp)

Central Research Laboratory, Hitachi, Ltd
1-280 Higashi-koigakubo, Kokubunji-shi, Tokyo 185, Japan


Abstract

We have developed a program for rapid identity-searching of DNA sequences allowing several percentages of sequencing error rates. The program was applied to a large-scale searching of Expressed Sequence Tags (ESTs) against the GenBank sequences, and from this searching results the error information of ESTs was obtained. The 15,666 sequences of human ESTs were searched in the primate division in GenBank release 80 within 23.3 hours that is only one-thirty of the time needed when FASTA is used. The total error rate 2.45 percent was obtained from the alignments between the ESTs and the primate sequences satisfying the identity-conditions.