Post-Genome Informatics

Post-Genome Informatics
Preface

The Human Genome Project was initiated in the late 1980s as the result of technological developments in molecular biology and with the expectation of biomedical benefits. The project's goal was to determine the entire sequence of 3 billion nucleotides in the human genome and to identify and understand a whole repertoire of human genes. At the same time, genome projects were conceived and undertaken for a number of organisms from bacteria to higher eukaryotes. Both the public and private sectors are spending unprecedented amounts of resources in order to quickly decipher the genomes and to claim discovery of the information. However, the determination of the complete genome sequence is not the end of the story. It is actually the beginning of 'post-genome informatics', especially in view of the fact that the biological function cannot be inferred from the sequence information alone for roughly one half of the genes in every genome that has been sequenced.

Conceptually, whole genome sequencing represents an ultimate form of reductionism in molecular biology. It is hoped that complex processes of life can be explained by simple principles of genes. In experimental reality DNA sequencing requires drastic reductions from higher to lower dimension- to destroy the cell and to extract the DNA molecules. We do not question how much information is lost in these procedures, but simply accept the common wisdom that the genome, or the entire set of DNA molecules, contains all the necessary information to make up the cell. Post-genome informatics is then considered as an attempt at synthesis from lower to higher dimension whereby a functioning biological system of the cell is reconstructed from the entire complement of genes.

The genome projects have transformed biology in many ways, but the most impressive outcome is the emergence of computational biology, also known as bioinformatics. It is no longer possible to make advances in biology without integration of informatics technologies and experimental technologies. Here we like to distinguish between genome informatics and post-genome informatics. Genome informatics was born in order to cope with the vast amount of data generated by the genome projects. Its primary role is therefore to support experimental projects. In contrast, post-genome informatics, as we define here, represents a synthesis of biological knowledge from genomic information toward understanding basic principles of life, as well as for practical purposes in biomedical applications. Post-genome informatics has to be coupled with systematic experiments in functional genomics using DNA chip and other technologies. However, the coupling is the other way around- informatics plays more dominant roles of making predictions and designing experiments.

This book is an introduction to bioinformatics, an interdisciplinary science encompassing biology, computer science, and physics. In fact the major motivation for writing this book is to provide conceptual links between different disciplines, which often share common ideas and principles. The content is in part a translation of my book in Japanese Invitation to Genome Informatics (Kyoritsu Shuppan, Tokyo, 1996) which originates from my lecture notes on theoretical molecular biology for undergraduate students in the Faculty of Science, Kyoto University. The first chapter is a concise introduction to molecular biology and the Human Genome Project. The second and third chapters provide an overall picture of both database and computational issues in bioinformatics. They are written for basic understanding of underlying concepts rather than for acquiring the superficial skills of using specific databases or computational tools. Because most algorithmic details are deliberately left out in order to cover a wide range of computational methods, it is recommended that the reader consult the references in the Appendix when necessary.

The last chapter, which is original in the English edition, is the essence of post-genome informatics. It introduces the emerging field of network analysis for uncovering systemic functional information of biological organisms from genomic information. KEGG (Kyoto Encyclopedia of Genes and Genomes) at www.genome.ad.jp/kegg/ is a practical implementation of databases and computational tools for network analysis. It is our attempt to actually perform synthesis of biological systems for all the genomes that have been sequenced. Since the field of network analysis is likely to evolve rapidly in the near future, KEGG should be considered as an updated version of the last chapter.

The very concept of post-genome informatics grew out of my involvement in the Japanese Human Genome Program. I have been supported by the Ministry of Education, Science, Sports and Culture since 1991 as principal investigator of the Genome Informatics Project. This book is the result of active collaborations and stimulating discussions with the many friends and colleagues in this project. I am grateful to Chigusa Ogawa, Hiroko Ishida, Saeko Adachi, and Toshi Nakatani for their work on the drawings and to Stephanie Marton for checking the text of the English edition. The support of the Daido Life Foundation is also appreciated.

With a diverse range of Internet resources publicly available, it is not difficult for anyone interested to start the study of post-genome informatics. I hope this book will help students and researchers in different disciplines to understand the philosophy of synthesis in post-genome informatics, which is actually the antithesis of the extreme specialization found in current scientific disciplines. The study of post-genome informatics may eventually lead to a grand synthesis- a grand unification of the laws in physics and biology.

Minoru Kanehisa

Kyoto, Japan
May 1999

Post-Genome InformaticsPreface

Post-Genome Informatics
Preface