
Jürgen Sühnel (jsuehnel@imb-jena.de)
Institut für
Molekulare Biotechnologie, Postfach 100813,
D-07708 Jena / Germany
This poster describes the Image Library of Biological Macromolecules at the Institute of Molecular Biotechnology (Jena/Germany) including its Virtual Reality Division. The recommended way of reading this poster is to launch two web browsers, to read the poster with one browser and to check out the features of the Library with a second one.
Introduction
Structural information on biological macromolecules is an essential requirement for our understanding of biological function and for a deliberate variation of this function by rational or evolutionary approaches. Progress in recombinant DNA technology and RNA synthesis, X-ray and NMR instrumentation and computer and software technology has led to an increasing rate of accumulation of new structures.
Structural data of biological macromolecules in terms of coordinate files are available from the Protein Data Bank (PDB) at Brookhaven National Laboratory (Bernstein et al., 1977) and from the Nucleic Acid Database (NDB) at Rutgers University (Berman et al., 1992). Visualization of these structures plays a central role in identifying structural motifs and even in understanding biological function. Often the structure images lead directly to new biological insights or to new hypotheses. Visual information is obviously very appropriate for 'understanding' the complex structures of biological macromolecules (Hall, 1995).
The usual way to visualize the structures of biological macromolecules is to retrieve the coordinate files from PDB or NDB over the Internet and then to use one of the molecular graphics packages for displaying and possibly manipulating the structure. This approach is the method of choice if one is interested in a particular structure in a more detailed way. On the other hand, a rather common situation is that one would prefer to have the image of a structure directly available without the need to spend some time with the generation of the image or even without having access to a molecular graphics software. This is especially important for the large and heterogenous community outside structural biology.
Using the new hypermedia protocols of the World-Wide Web it is now very easy to transfer images or videos over the Internet. Therefore, we have started to set up an Image Library of Biological Macromolecules, which combines the visualization of biological macromolecules with the new network tools (Sühnel, 1996). The images are in the public domain and can freely be retrieved from the IMB Jena WWW server (http://www.imb-jena.de/IMAGE.html).
The Image Library of Biological Macromolecules
The Image Library provides images of biological macromolecules and, in addition, relevant elementary information like images of amino acids and standard and modified nucleotides, images on the definition of strand direction and of the torsional angles in nucleic acids and DNA model conformations. It is subdivided into DNA, RNA, protein and carbohydrate divisions, where nucleic acid-protein complexes can be found in the nucleic acid parts and DNA-RNA hybrids in the RNA division. Each entry consists of a text file, of a variety of molecular structure images, and of color coded distance plots. The text file contains general information on the structure, like the sequence and the citation of the structure report and a listing of all image files available. The images of molecular structures are intended to provide as much information as possible. Therefore, mixed rendering, coloring and labeling techniques are extensively used. All molecular images are available in a mono and in a stereo representation. In addition to the molecular images color coded distance plots are included. Distance plots relate the distances between representative atoms of amino acids and/or nucleic acids in the 3D structure to the sequence (Godzik et al., 1993). Insofar they yield very useful structural information in addition to the 3D images. The filenames are created according to the following rules: First the PDB and/or the NDB code or some other name for model structures is used. The second part of the name indicates the image type. The name of the graphics software used stands for the usual 3D images of molecular structures, distc and dist3d indicate contour and 3D distance plots and sec images of secondary structures. Moreover, an additional s indicates a stereo image. For example, the file 1ecl_midas_3_s.gif represents a stereo representation of the structure of the 67 K N-terminal fragment of E. coli DNA topoisomerase with the PDB code 1ecl (Lima et al., 1994) generated using the graphics software MIDASPLUS (Ferrin et al., 1988).
The following molecular graphics software packages were used:
The distance matrices were calculated using our own code and the color coded distance plots (3D or contour) were generated using
Currently (September 1996), the Image Library has about 3500 image files of about 300 biomolecular structures. The entries include all RNA structures stored in the Protein Data Bank (PDB) and in the Nucleic Acid Database (NDB), about 170 proteins, approximately 70 DNA structures and a few carbohydrates. Due to the file name rules used it is very simple to search for special file types. On the other hand, the search option can also be used to find author names, for example, or even all structures with a particular modified nucleotide.
There is a great deal of structures of biological macromolecules which are not available via the Protein Data Bank or the Nucleic Acid Database for some reason. Therefore, authors are encouraged to deposit their structures at the structure databases mentioned. On the other hand, the Image Library could represent a useful addition to these databases. We are willing to include in a relative informal way coordinates of published structures which are not intended to be made available via PDB or NDB for some reason. This refers also to structures obtained by modeling procedures. Further, we would like to encourage authors to make available to the scientific community their own images of structures via the Image Library be the coordinates deposited at PDB/NDB or not. This could be interesting for authors of structure reports because most journals have restrictions on the number of color plates.
Recently, electron microscopists have made important progress in reconstructing 3D images of such complex biological objects like the ribosome at a resolution of 25 Å (Moore, 1995). We expect in the near future a fruitful interplay between structure images of this type and of building blocks whose structure is already known at atomic resolution. Therefore, it would be useful if images of biological objects not known at atomic resolution could be included into the Image Library, too.
Anybody interested in contributing to the Image Library of Biological Macromolecules should contact the author (e-mail: jsuehnel@imb-jena.de).
When starting this project in December 1993 we were not aware of similar attempts. Now, we know that the Protein Data Bank has almost simultaneously provided images and that the Swiss-3D-Image collection at the University of Geneva published biopolymer images on the Internet earlier. Currently, it is rather the rule than the exception that research reports on the World-Wide Web include images. Nevertheless, there are only four large image archives of biological macromolecules, the Protein Data Bankat Brookhaven National Laboratory, Molecules R US at the National Institutes of Health, Swiss- 3D-Image at the University of Geneva (Peitsch et al. 1995) and the Image Library of Biological Macromolecules at IMB Jena (Sühnel, 1996). The Nucleic Acid Database provides only a few images so far and is therefore not classified as a large image archive. There is one basic difference between PDB and Molecules R US on the one side and Swiss-3D-Image and our Image Library on the other side. The first two archives provide automatically generated images of all structures available. The disadvantage of this approach is that the images generated have a relatively low information content. On the other hand, Swiss-3D-Image and the Image Libary provide very instructive images of only a relatively small number of known structures, however. There is almost no overlap between Swiss-3D-Image and the Image Library. Swiss-3D-Image has contrary to the Image Library almost no nucleic acid structures and for proteins both archives are complementary. Insofar both archives provide together already a substantial number of high-quality images of biological macromolecules via the Internet.
Virtual Reality Modeling
The Virtual Reality Modeling Language (VRML) is a standard for describing interactive three-dimensional scenes delivered across the Internet. It enables one to rotate, translate or zoom three-dimensional (3D) objects, like biological macromolecules for example. Of course, this can be done much better using molecular graphics packages. However, VRML viewers will be an integral part of the next-generation web browsers. Therefore, this approach is very appropriate for making available visual structure information on biological macromolecules to a broader community within and even outside of science. Moreover, dynamic behaviour will be included into the VRML specification soon, which makes this format appropriate for collaborative work.
In principle, VRML is an 3D image format supplemented by network tools. VRML is a subset of the Open Inventor ASCII file format and it describes 3D objects or scenarios in an object oriented manner. The basic elements are various node types: shape nodes (points, lines, spheres, cylinders,...) property nodes (color, texture maps, geometry transformation,...) group nodes (for implementing a hierarchical structure), camera nodes, light nodes, WWWInline nodes (for loading other VRML files into the current scene) and WWWAnchor nodes (hyperlinks). The VRML 1.0 specification was finalized in May 1995 and the VRML 2.0 specification is currently under development.
In 1995 the first, VRML viewers like WebSpace and i3D have become available, which extend the so far two-dimensional World-Wide Web to the third dimension. One of the first viewers which support a draft version of VRML 2.0 is CosmoPlayer by Silicon Graphics. A complete list of VRML browsers and of other VRML related material can be found in the VRML repository at the San Diego Supercomputer Center.
It is immediately obvious that VRML can greatly contribute to a better dissemination of visual structural information on biopolymers. Starting with May 1995 we have, therefore, extended the Image Library by a VRML division which currently consists of more than 600 files in VRML format. This was one of the first applications of Virtual Reality Modeling in Biology and to the best of our knowledge the very first application not devoted to demonstration purposes alone.
However, other groups have published earlier VRML demonstration examples on the web:
Further examples of using VRML for structural biology are the
We have generated the VRML files of biopolymers and of the corresponding building blocks (amino acids, nucleotides) using InsightII from Molecular Simulations, the MIDASPLUS molecular graphics and display system and the SGI Iris Explorer EyeChem modules written by Omer Casher, Imperial College, London. InsightII has a direct VRML interface, whereas in the other two cases, first files in the Inventor format have to be created which then can be converted to the VRML format. To get an impression how the VRML format looks like see the ball-and-stick representation of alanine here and the underlying file here.
We would like to encourage developers of other molecular graphics packages to include VRML interfaces into their software tools.
More recent examples are the WebLab viewer, the Tripos Molecular Inventor Netscape Navigator Plugin and the object-oriented three-dimensional visualization development kit Molecular Inventor, which all can generate either Inventor or VRML files.
VRML browsers still suffer from various problems. For example, complex structures in high-quality representations may yield very large datasets. Fortunately the compression rate is rather high, in many cases 90%. This reduces the bandwidth required. On the other hand, uncompression takes time. Insofar, it may happen that currently less powerful computers are not able to manage larger VRML files. You will come across such examples if you check out the Virtual Reality Division of the Image Library. One should realize, however, that the performance of the viewers, like WebSpace for example, has already dramatically increased since May 1995. To the best of our knowledge i3D is the first VRML which supports crystal eyes stereo representations. An interesting application we expect in the near future is that online journals will contain figures in VRML format, which will enable the reader to interact with the three-dimensional image objects. This is not only important for structure representations but for any three-dimensional figures.
The VRML format is also appropriate for further annotation with any types of notes using, for example IRIS Annotator or Showcase, for Media Mail or for real time collaboration with InPerson. The claim that VRML viewers will become a standard part of future web browsers is confirmed by various new products including Live3D from Netscape, the Tripos Molecular Inventor Netscape Navigator Plugin and the Microsoft Internet Explorer 3.0. Further, is a new object-oriented three-dimensional visualization development kit.
Various improvements and extensions are already under development or can be expected in the near future. One of them refers to the combination of VRML representations with Java Applets . A Java applet is a Java program that can be included in an HTML page. This extends, of course, the possibilities of displaying visual information substantially. One example from the structural biology field is the Java Lattice.
In summary, we would like to point out that Internet-based image archives of biopolymer structures including the new VRML format have a lot to offer for a better dissemination of visual information on biological macromolecules within and outside the scientific community.
Acknowledgements
I am grateful to F. Haubensak for setting up the IMB Jena World-Wide Web server and to K. Mehliß for writing the program DIST, which generates distance matrices and difference distance matrices.
References
Evans,S.V. (1993) SETOR: Hardware lighted three-dimensional solid model representations of macromolecules. J. Mol. Graphics 11, 134-138.
Ferrin,T.T., Huang, C.C., Jarvis, L.E. and Langridge,R. (1988) The MIDAS display system. J. Mol. Graphics 6, 13-27, 36,37.
Hall,S.S. (1995) Protein images update natural history. Science 267, 620-624.
Moore,P.B. (1995) Ribosomes seen through a glass less darkly. Structure 3, 851-852.
Sühnel, J., (1996) Image Library of Biological Macromolecules, Comput. Appl. Biosci. 12, 227-229.