DNA sequence comparison based on amino acid similarity

S. Hiraoka (hiraoka@crl.hitachi.co.jp)
K. Nagai (k-nagai@crl.hitachi.co.jp)

Central Research Laboratory, Hitachi, Ltd
1-280 Higashikoigakubo, Kokubunji-shi, Tokyo 185, Japan

Abstract

DNA databases are growing exponentially. Sequence similarities are often the most valuable information we can get from DNA databases. Especially for protein-coding sequences, comparison of translated sequences give us clues to protein function. However gaps in DNA sequences prevent us from translation and force us to compare them as they are. We present an algorithm for DNA sequence comparison which translates the sequences most reliably and compares the translated sequences. The method enables us to find protein sequence similarity in DNA sequences even if we do not know the protein sequences which are coded in the DNA sequences.