Date |
May 14, 2007 |
Speaker |
Dr. Koji Tsuda, Max Planck Institute for Biological Cybernetics |
Title |
Mathematical Programming Methods for Learning from Structured Data:
Applications to Molecular QSAR Analysis, HIV Drug Resistance Prediction,
and Image Categorisation |
Abstract |
In learning from structured data such as graphs, trees and itemsets, frequent substructure mining methods are used for deriving features.
However, frequent patterns are not necessarily informative for the given
learning problem. We propose a mathematical programming boosting method
that progressively collects informative patterns. Compared to AdaBoost,
our method can build the prediction rule with fewer iterations. To
apply the boosting method to graph data, a branch-and-bound-type pattern
search algorithm is developed based on the DFS code tree. The
constructed search space is reused in later iterations to minimize the
computation time. Our method can learn more efficiently than the
simpler method based on frequent substructure mining, because the output
labels are used as an extra information source for pruning the search
space. Furthermore, by engineering the mathematical program, a wide
range of machine learning problems can be solved without modifying the
pattern search algorithm. I will show practical applications to graphs
(QSAR) and itemsets (HIV and images).
Joint work with Hiroto Saigo, Sebastian Nowozin, Taku Kudo, Takeaki Uno
|
|