Multiple sequence alignment using anchor points through generalized dynamic programming

J. Gracy[1] (gracy@embl-heidelberg.de)
J. Sallantin[2] (js@lirmm.fr)

[1] European Molecular Biology Laboratory,
Meyerhofstraße 1 - Postfach 102209 - 69012 Heidelberg - Germany.
[2] Laboratoire d'Informatique, de Robotique
et de Micro-électronique de Montpellier,
161 rue Ada - 34392 Montpellier Cedex 5 - France.


Abstract

A generalization of the dynamic programming algorithm applied to the multiple alignment of protein sequences is proposed. The algorithm has two main procedures: (i) local correspondences between sequences - hereafter called anchor points - are selected according to a criterion that combines local and global simlilarity values, (ii) the alignment is constructed recursively by choosing and linking together the optimal anchor points. This multiple sequence alignment algorithm achieves a good compromise between the O(L^{N}) complexity of the exhaustive dynamic programming approach applied to N sequences of length L and the poor quality of the alignments obtained with methods based on a hierarchical clustering of the sequences.