Metrics for Protein Sequence Comparison

Tsukasa Sakai (sakai@nibh.go.jp)

Biomolecules Department,
National Institute of Bioscience
and Human-Technology(NIBH),
Higasi 1-1, Tsukuba 305, Japan


Abstract

Substitution odds r(i,j), for amino acid residues, can be transformed to similarities s(i,j) by normalizing with geometric average of conservative odds r(i,i) and r(j,j). Similarities thus derived for all twenty natural amino acid residues in proteins, conform to the range 0 to 1, and have complementary dissimilarities. Empirical test has qualified that the dissimilarity satisfies all metric requirements as distance between residues. Relative certainty, as identity index, calculated from both similarity and dissimilarity, can be used as matching scores, consistent with both of them, in protein sequence comparison.