Variation of the Order of Importance in the Base Position for 5'-Splice Site Sequences

Khawaja Sirajuddin (shiraz@muroran-it.ac.jp)
Tomomasa Nagashima
Koichi Ono

Department of Computer Science and Systems Engineering
Muroran Institute of Technology,
27-1, Mizumoto-cho, Muroran 050, Japan.


Abstract

The consensus sequence for 5'-splice site has been proposed as (AC)AG/GT(AG)AGT. But the actual splice site sequence differs from it at a certain extent more or less. In this paper we analyze various mammalian globin genes using the induction of decision tree. We have found that the prediction rate for discriminating unknown sequences increases in accordance with the increase of the rate of false splice site sequences with dinucleotide GT at 4th and 5th position in the learning data set.