Building and Searching DNA Data Base on a Relational Model

Hajime Kitakami
Yukiko Yamazaki
Yoshio Ugawa
Kazuo Ikeo
Naruya Saitou
Yoshio Tateno
Takashi Gojobori

DNA Data Bank of Japan, National Institute of Genetics

Abstract

The DNA database has been managed in a flat-file system at the DDBJ since 1985.The flat-file system is inadequate for building and searching the DNA database which is receiving an explosive increase entries. We carried out a transformation from the flat-file system to the relational database system with GenBank staff. The schema of the relational database was designed as follows:

(1) Decomposing the DNA data into both structuralized and non-structuralized data

(2) Partitioning large tables into small tables without update anomaly

(3) Making a flexible relationship among tables to represent complex data
This schema provided the capability for building and searching the DNA database with less memory on the relational database system. However, the schema was implemented as a complex network structure with about 60 tables. It is difficult to use the SQL search language of the relational database system with this schema.

We defined and simplified the schema for easy use of the commands using the view function of the relational database system on the existing schema. The simplified schema implemented in the view function was defined as LOCUS, DEFINITION, ACCESSION, KEYWORDS, SOURCE, REFERENCE, FEATURES, ORIGIN, and SEQUENCE tables which are virtual tables without storing real data. It represents aspects of the traditional DDBJ/EMBL/GenBank data format which are familiar to biologists using the flat-file system.

Users can easily join these virtual tables using attribute storing accession numbers. Since we developed the simplified schema, users are able to use the SQL search command easily and get quick response in DNA data searches.