A Simple Method for Finding Local Sequence Similarities

Keiichi Nagai[1]
Tetsuo Nishikawa[1]
Hideki Kambara[1]
Toshihisa Takagi[2]

[1]Central Research Laboratory, Hitachi, Ltd.
1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185
[2]Human Genome Center, Institute of Medical Science, University of Tokyo
6-4-1, Shirokanedai, Minato-ku, Tokyo 108

Abstract

Conventional database search programs for finding local similarities in protein and DNA sequences, such as the one based on the Smith-Waterman algorithm, FASTA, and BLAST, can contain subregions having high similarity, low similarity, and even no similarity. We propose a simple method for finding significant local sequence similarity regions, where the alignment results of two sequences are graphed as integrated scores calculated along the aligned sequences using the match, mismatch, and gap penalty scores. This method has been used to find local similarity subregions in alignment results obtained by BLAST or the Smith-Waterman algorithm. Potential applications for finding domain structures and the characteristic sequence patterns are also shown.