Genomatica: An integrated management tool for genome sequencing projects

Yutaka Akiyama[1]
Hirotada Mori[2]
Satoru Kuhara[3]
Tetsuya Furukawa[4]
Kenji Satou[5]
Nobuyuki Miyajima[1]
Naoki Ogasawara[6}
Yasufumi Murakami[7]

[1]Insutute for Chemical Research, Kyoto University
[2]Institute for Virus Research, Kyoto University
[3]Graduate School of Genetic Reseources and Technology, Kyushu University
[4]Computer Center, Kyushu University
[5]Educational Center for Information Processing, Kyushu University
[6]Department of Medicine, Osaka University
[7]RIKEN Tsukuba Life Science Center

E-mail: genomatica@kuicr.kyoto-u.ac.jp (or gentools@kuicr.kyoto-u.ac.jp)


Abstract

Genomatica is an integrated software system dedicated to supporting the construction and management of a research database which is a total organization of genome information of a single organism ranging from bacterium to Homo sapiens. Historically DNA information have been treated by taking the gene as the unit of sequence, and they are independently stored in sequence databases such as GenBank, EMBL and DDBJ. Thus one can obtain little information about their linear arrangement along a chromosome. To build a chromosomal sequence database is greatly helpful to the researchers of the specific organism and also indespensable to promote the study of total information structure of chromosomes.

The Genomatica tool provides mainly four facilities. The first function is to support the procedure of integrating a number of dispersed DNA sequences held in the database with newly sequenced data coming from any laboratory in the sequencing project and to produce the linear arrangement of sequence information for the specified chromosome. Because many DNA data already entered in common databases have no description on chromosome localization, the tool has a function to recognize overlapped regions among each sequence datum for constructing a correctly ordered set.
The second function is as a reseach database of a specific organism with an excellent graphic interface which allows researchers to access the latest data thorough very easy operations. The third is to provide a powerful sequence analysis toolbox on the desktop of experimetal laboratories. The fourth is the network communication facility for data exchanges between project researchers, especially used for periodical distribution of a latest version of the developing database accompanied by the full results of homology search with all ORF candidates found on the chromosome (searching versus common databases and also mutual searching among ORFs are regularly done on the host machine). The Genomatica will be distributed as a freeware.