Towards an information managemenet system for genomic objects and processes

Otto Ritter (oritter@dkfz-heidelberg.de)

European Data Resource for Human Genome Research, Department of Molecular Biophysics, German Cancer Research Center (DKFZ), Heidelberg, Germany


Abstract

IGD (The Integrated Genomic Database) is a collaborative project (Martin Bishop of HGMP, London, Steve Bryant of ICRF, Clare Hall, Richard Durbin of MRC, Cambridge, Hans Lehrach of ICRF, London, Victor Markowitz of LBL, Berkeley, Jean Thierry-Mieg of CNRS, Montpellier, and Otto Ritter of DKFZ, Heidelberg) aimed at the development of an integrated environment for the representation of genomic data objects, experimental processes, and analysis methods.

The prototype system integrates public data of several major databases physically into a server readonly database. IGD front-end clients then allow scientists to store retrieved data in a local database, where these can be further edited and merged with experimental data. The global integrated schema is defined in an object modelling language independent of the underlying database implementation. Both ACEDB and SYBASE systems were used alternatively for the back-end database, and ACEDB is used as the graphical front-end data manager.

The following databases have already been integrated into IGD: the EMBL Nucleotide Sequence Database, the SWISS-PROT Protein Sequence Database, the Genome Data Base (GDB), the Online Mendelian Inheritance in Man (OMIM) database, The Reference Library Database (RLDB), the UK DNA Probe Bank, and several sets of experimental data.

Different aspects of the integration will be discussed:

    1. Integration of data -design of a comprehensive database for genome related conceptual and experimental objects, and its import of data from major public databases and experimental resources.

    2. Integration of the object and process information -processes, such as laboratory protocols and data analyses, modeled together with 'static' data objects. IGD used locally as a laboratory notebook.

    3. Integration of data and knowledge -deductive interface to the IGD database to faciliate logic based analysis of the data, and the representation of biological knowledge, as well as of data integrity rules.

    4.Integration of interfaces to analytical tools-interface between the integrated database and major external software packages.

    5.Visual integration of complex data and operations -graphical display for complex objects like chromosome genetic and physical maps, clone grids, sequence feature maps, etc.

    6. Possible integration of future databases and tools -open and extensible system design.