Finding <I>k</I>-best Alignments of Multiple Sequences

Finding K-best Alignments of Multiple Sequences

Gen Shibayama (gengen@is.s.u-tokyo.ac.jp)
Hiroshi Imai (imai@is.s.u-tokyo.ac.jp)

Department of Information Science
Faculty of Science, Tokyo University
7-3-1 Hongo, Bunkyo-ku, Tokyo 113 Japan

Abstract

Detecting similarities of multiple genome sequences is one of the most important topics in genome informatics. For the purpose of finding such similarities, an alignment with the highest score with respect to some similarity criterion is provided as an output. However, the alignment with the best score is not necessarily the most significant alignment of the sequences from the viewpoint of biology. In this respect, providing suboptimal alignments is very useful.

Since finding an alignment of sequences corresponds to finding a path in some directed acyclic graph, we propose a simple algorithm to enumerate all K-best alignments in order, where K may not necessarily be specified beforehand, by finding the K longest paths in the graph. We further consider finding the subgraph formed by such K longest paths. Several useful approaches to find the optimal paths in a graph are also mentioned.