Kanehisa Laboratory

Laboratory of Genome Database

Research Topics

Since the completion of the Human Genome Project, high-throughput experimental projects have been initiated for uncovering genomic information in an extended sense, including transcriptomics, proteomics, metabolomics, glycomics, chemical genomics, and metagenomics. We are developing bioinformatics technologies to integrate and interpret such large-scale datasets, especially for medical and pharmaceutical applications.

1. Databases for diseases and drugs

KEGG is a database of biological systems that integrates genomic, chemical, and systemic functional information. It is widely used as a reference knowledge base for understanding higher-order functions and utilities of the cell or the organism from genomic information. Although the basic components of the KEGG resource are developed in Kyoto University, this Laboratory in the Human Genome Center is responsible for the applied areas of KEGG, especially in medical and pharmaceutical sciences. We develop KEGG DRUG, which is a chemical structure database for all approved drugs, associated with target information in the context of KEGG pathways, efficacy information in the context of hierarchical drug classifications, etc. We also develop KEGG DISEASE as a new addition to the KEGG suite of databases. Each disease entry consists of a list of diseases genes and other lists of molecules such as environmental factors, markers, drugs, etc. Both DRUG and DISEASE are highly integrated with other KEGG databases including PATHWAY, BRITE, GENES, and COMPOUND, and also with other Internet resources.

2. SOAP/WSDL interface to the KEGG system

This Laboratory is also responsible for the development of KEGG API. It is a web service to use the KEGG system from user's programs via SOAP/WSDL. KEGG API provides valuable means of accessing the KEGG system, such as for searching and computing biochemical pathways in cellular processes or analyzing the universe of genes in the completely sequenced genomes. The users can access the KEGG API server by the SOAP technology over the HTTP protocol. The SOAP server also comes with the WSDL, which makes it easy to build a client library for a specific computer language. This enables the users to write their own programs for many different purposes and to automate the procedure of accessing the KEGG API server and retrieving the results.

3. Automatic generation of EST consensus contigs

EST sequencing has proven to be an economically feasible alternative for gene discovery in species lacking genome sequences, such as plants. Ongoing large-scale EST sequencing projects feel the need for bioinformatics tools to facilitate uniform ESTs handling. This brings about a renewed importance to a universal tool for processing and functional annotation of large sets of ESTs in order to cover the complete transcriptome of an organism. This Laboratory has developed EGassembler, which provides an automated as well as a user-customized analysis tool for cleaning, repeat masking, vector trimming, organelle masking, clustering and assembling of ESTs and genomic fragments. This tool is used to develop the KEGG EGENES database for plants.

4. Ortholog and paralog clusters in complete genomes

We have been developing a computational method for finding appropriate orthologous gene clusters automatically. It is based on a graph analysis of the KEGG SSDB database, containing sequence similarity and best-hit relations of genes among all the completely sequenced genomes. We introduce a hierarchy (evolutionary relationship) of organisms and treat the SSDB graph as a nested graph, in order to simplify the complexity of a huge graph object. The KEGG OC service will be made available in the near future.

5. Repository for community genome annotation

KEGG DAS is an advanced genome database system providing DAS (Distributed Annotation System) service of genome map information for all organisms in KEGG. We have been developing the server based on open source software including BioRuby, BioPerl, BioDAS and GMOD/GBrowse to make the system consistent with the existing open standards. The contents of the KEGG DAS database can be accessed graphically in a web browser using GBrowse GUI (graphical user interface) and also programatically by the DAS protocol.

List of publications (for both Tokyo and Kyoto)