The ACEOB genome database: support for large scale sequencing projects

Richard Durbin (rd@sanger.ac.uk)

MRC LMB, Hills Rd, Cambridge CB2 2QH, UK, and
Sanger Centre, Hinxton, Cambridge CB10 1RQ, UK

Abstract

ACEDB is a novel database system, developed in collaboration with Jean Thierry-Mieg from CNRS, Montpellier, France, that was designed to meet the needs of the C. elegans mapping and sequencing project. We use it both to maintain internal laboratory results, and as a means to distribute information about the C. elegans genome to other workers in the field. It is highly flexible and is being adopted by a number of other genome projects working on a variety of organisms, from bacteria to Arabidopsis to man. It will also provide both a database system and front end for the European Integrated Genome Database.

Over the past few years ACEDB has developed to handle a variety of mapping data for both physical and genetic mapping, and now has integrated tools for assembly both genetic maps from recombination and complementation data, and physical maps from fingerprint overlap data. However the main focus of this talk will be on developments in ACEDB for managing large-scale sequence data. Facilities developed for this include the ability to scroll smoothly along multi-megabase sequences, an integrated package for gene prediction (based on the GENEFINDER program from Phil Green, Washington Uiversity, St. Louis), and tools to display multiple protein sequence alignments, including connections to external public protein databases (developed by Erik Sonnhammer, Sanger Centre).

The application of ACEDB will be illustrated with data from the C. elegans genome project, including in particular over 2 Mb of contiguous genomic sequence from chromosome III. The ACEDB software is available by anonymous ftp from servers in England (131.111.84.1), France (193.49.104.10) and the USA (NCBI, 130.14.20.1).