Rapid identity searching program for DNA sequences and its applications to cDNA grouping

T. Nishikawa (nisikawa@crl.hitachi.co.jp)
S. Hiraoka (hiraoka@crl.hitachi.co.jp)
N. Kasahara (kasahara@crl.hitachi.co.jp)
K. Nagai (k-nagai@crl.hitachi.co.jp)

Central Research Laboratory, Hitachi, Ltd
1-280 Higashi-koigakubo, Kokubunji-shi, Tokyo 185, Japan

Abstract

We developed a program that determines whether or not a query sequence is included in a database within a permitted matching error rate. It consists of two steps: bit-table filtration and dynamic programming matching. The bit table filtration quickly excludes many sequences that have no relation to the query sequence and identifies the sequences without missing that match the query sequence within the given error rate. The application of this program to large-scale human cDNA grouping showed that it took only one tenth the time required by FASTA for grouping all human cDNA.