Pattern Matching Algorithms for
Three-Dimensional Protein Structures

Tatsuya Akutsu

Department of Computer Science
Gunma University
1-5-1 Tenjin, Kiryu, Gunma 376, Japan
e-mail: akutsu@cs.gunma-u.ac.jp

The comparison of tertiary (three-dimensional) protein structures plays a very important role in molecular biology. A large number of practical pattern matching algorithms for 3D protein structures have been proposed. However, they do not seem to be sufficient from the viewpoint of the computation time and the robustness against insertions or deletions of sequences. Moreover, they are rather heuristic and then they do not have theoretical proofs for the qualities of the outputs. On the other hand, geometric pattern matching algorithms have been studied in computational geometry, in which there are theoretical proofs for the qualities of the outputs. However, most of them do not seem to be practical since they are too complicated to be implemented and the time complexities are not small. Thus we have been developing algorithms for 3D protein structures that are practical and have theoretical foundations.

In this talk, we focus on two problems: the substructure search problem, and the alignment problem. In the substructure search problem, given a small fragment of 3D structure, all structures that have substructures similar to the fragment are enumerated. It is important for searching 3D protein structure databases since the number of known 3D protein structures exceeds 1000 and it grows year by year. In the alignment problem, given two (or more) 3D structures, structurally equivalent residues are identified. It is important to get good matchings between large protein structures since insertions and deletions of sequences must be considered.

For the substructure search problem, we have developed new hashing techniques akutsu95a. For the alignment problem, we have developed an approximate matching algorithm akutsu94b using a similar technique as in mgo94. These algorithms are practical and have theoretical foundations. In this talk, they are overviewed and compared with other heuristic algorithms. Moreover, we will present another algorithm for the comparison of two 3D protein structures akutsu94c, in which a reduction to the K-link path problem is used, where this problem is well known in computational geometry.

References

[1]T. Akutsu, K. Onizuka and M. Ishikawa, ``New hashing techniques and their application to a protein structure database system,'' Proc. 28th Hawaii International Conference on System Sciences, Vol.5, 197--206, 1995.

[2]T. Akutsu,``Substructure search and alignment algorithms for three-dimensional protein structures,'' Technical Report AL-41-1, Information Processing Society of Japan, 1994.

[3]T. Akutsu and H. Tashimo, ``Representation of a 3D protein structure using a sequence of line segments,'' Proc. FGCS Workshop on Fusion of Molecular Biology and Knowledge Information Processing, 18--20, 1994.

[4]J. S. B. Mitchell, M. T. Goodrich and M. W. Orletzky, ``Practical methods for approximate geometric pattern matching under rigid motions,'' {\em Proc. ACM Symposium on Computational Geometry}, 103-112, 1994.