BRITE: Deductive Database of the Genome and the Biological System Based on Binary Relations
Minoru Kanehisa, Professor, Bioinformatics Center, Institute for Chemical Research, Kyoto University
The accumulation of complete genome sequences for a number of species from human to bacteria is rapidly becoming an information infrastructure for the biomedical science in the 21st century.
The genome information is the basis for understanding the principles that govern biological phenomena at different levels, such as cellular, organism, and biosphere levels, and also for promoting applied research and development in genome industries.
All the sequence data thus far accumulated are made publicly available in the International Nucleotide Sequence Databases DDBJ/EMBL/GenBank.
However, the sequence information alone is not sufficient for understanding functional meanings and utilities of the biological system.
New databases have to be developed based on new concepts and new technologies.
As part of the activities of the Institute for Bioinformatics Research and Development (BIRD), Japan Science and Technology Agency (JST), we develop a new type of function database called BRTE.
There are different approaches to computerizing functional information. One is sequence annotation, which is to describe function in a standadized text with a controlled vocabulary and often with a hierarchical structuring of knowledge about genes, proteins, and their functions.
A prime example is Gene Ontology, which attempts to maintain a controlled vocabulary that can be applied to different eukaryotic genomes and that can change dynamically with expanding knowledge.
We have proposed and practiced a second approach in KEGG, where a higher order function (rather than a molecular function) is associated with the structure of a network (graph) of molecular interactions.
For example, a biosynthetic function can be associated with a specific form of enzyme-enzyme relations in the metabolic pathway, and a cellular response to an external stimulus can be associated with a specific form of protein-protein interactions in the signal transduction pathway.
Although eventually all cellular functions may be related to networks of molecular interactions, current knowledge about such networks is quite limited.
Thus, in order to incorporate less organized knowledge about functions we develop BRITE, which is a database of binary relations.
In the KEGG graph nodes are genes, proteins, or other molecules, but in the BRITE graph (set of binary relations) other types of nodes may be used.
For example, the sequence-function relation is a binary relation between a molecule node and a function node.
The function in a controlled vocabulary is therefore an object that forms a binary relation, rather than an attribute of a gene or a protein as in Gene Ontology or most other molecular biology databases.
In the actual database development, we first focus on two types of binary relations.
One is the gene expression relation, or the relation between a transcription factor and a target gene product, and the other is the gene-disease relation.
Created on June 4, 2001
Updated on September 19,2005
[ Bioinformatics Projects ]