The Transformation of GDB

Robert J. Robbins (rrobbins@gdb.org)

Johns Hopkins University

Abstract

The Genome Data Base (GDB) project began in 1989 with support from the Howard Hughes Medical Institute. From 1 September 1991 until now, the project has been supported with U.S. government funds from the Department of Energy and the National Center for Human Genome Research at NIH. The switch to federal funding brought changes more profound than anyone expected, or even guessed in 1991.

The primary mission shifted from supporting annual Human Gene Mapping workshops to supporting the international Human Genome Project (HGP).

The database itself changed from a stand-alone system to a component of the information infrastructure of the HGP.

Continuous data entry from sites around the world has replaced annual data entry at large meetings.

Annual publications, such as the HGM reports and the HGML plot books have been replaced with publication on demand.

A single GDB product has been replaced with a family of services, as ISQL backends, ftp, gopher, and WAIS servers have been added.

Full graphical user interfaces are replacing simple terminal emulation. Dial-up telephone access at 1200 baud has been replaced with direct network connection. GDB now even has its own internet domain, GDB. ORG.

A monolithic software system is being replaced with a modular design.

And all of this has been accompanied with exponential growth in the amount of data being managed.

Despite these changes, GDB's fundamental design and basic data model still reflect their history, tracing back through HGML to word-processing files. The challenge now is to redesign GDB's data model and its systems so that they look to the future, not the past. As more data become available from various organisms, we can imagine the end-game of the HGP and begin designing a system to accommodate what will be found, not what has been found.

The current GDB data model involves several hundred relational tables, organized, from a user's perspective into several "managers" - a locus manager, a probe manager, a map manager, etc. Users interact with the managers to extract information from the system. Now, with support for publication on demand, GDB allows users to interact with the database to identify information of interest. Then, the user can "subscribe" to that information and GDB will send updated versions of the data at regular intervals directly to the user.

GDB's support for a directly accessible database "backend" has allowed others to produce software for directly accessing GDB data. This has allowed the development of local tools for genome centers and of independent third-party interfaces into GDB.

GDB's conceptual challenge will require an answer to the biggest question of all: what is a gene? As HGP findings accumulate, some fundamental genetic ideas seem to derive as much from the methods of classical genetic analysis as from the underlying biology. If our information-management systems cannot accommodate new findings, they will fail.

Although GDB has undergone great transformations, the greatest still lies ahead: building an information-management system to store all the instructions for creating life.