COMPUTER SEQUENCE ANALYSIS OF PHYTOCHROME
J. Sühnel (jsuehnel@imb-jena.de)

Biocomputing,
Institute of Molecular Biotechnology,
Beutenbergstraße 11, D-07745 Jena
G. Hermann (bgh@pop3.uni-jena.de )

Institute of Biochemistry
and Biophysics,
Friedrich Schiller
University Jena,
Philosophenweg 12, D-07743 Jena
The understanding of the mechanism through which phytochrome exerts its biological action is still hampered by the fact that its three-dimensional structure is not known. Indirect structural information was primarily obtained from biochemical and genetic approaches (for more details see Rüdiger and Thümmler, 1991; Furuya and Song, 1994; Quail et al., 1995; Quail, 1997; Song et al., 1997). In the last few years, methods of computer sequence analysis were also used in order to gain new insights into the structure of phytochrome. The results obtained so far suggest, e.g., a structural domain model (Romanowski and Song, 1992) and a three dimensional atomic model of a short sequence region around the chromophore binding site (Parker et al., 1994). Moreover, they indicate that phytochrome might be a light-regulated protein kinase (Schneider-Poetsch et al., 1991; Schneider-Poetsch, 1992). This proposal was further supported on the basis of striking sequence homologies between phytochrome and a regulatory protein of the chromatic adaptation in cyanobacteria which itself is structurally related to sensor histidine kinases (Kehoe and Grossman, 1996). Since over the past years steadily improved methods of computer sequence analysis have been developed and the databases have considerably increased in size, we have recently applied a repertoire of computer methods to the current databases in order to update the information available from sequence analysis (Sühnel et al., 1997). The results obtained are given in full detail on a phytochrome web site of the IMB Jena World-Wide Web server (http://www.imb-jena.de/PHYTO.html).
A few selected findings are presented here.
Twenty-three phytochrome sequences available from protein databases up to January 1997 were used for sequence analysis. The sequences are compiled on the IMB Jena phytochrome web site. Segments of non-random amino acid compositions were identified by the program SEG (Wootton, 1994). Similarity searches were performed using the BLAST interface of SWISS-PROT pointing to the web site at the Ecole Polytechnique Federale de Lausanne (http://expasy.hcuge.ch/cgi-bin/BLASTEPFL.pl) (Altschul et al., 1990). The search for potential motifs was carried out using PROSITE on the ExPASY web server (http://expasy.hcuge.ch/sprot/scnpsite.html). The multiple alignment was calculated by means of the AMPS package and the MULTAL program which is part of the CAMELEON sequence analysis tool of Oxford Molecular Ltd. (Taylor, 1990; Barton, 1994). For the estimation of the secondary structure the PHD method (Rost et al., 1994; Rost and Sander, 1994) and the ZPRED approach (Zvelebil et al., 1987) were applied.
Identification of segments with non-random amino acid composition
Segments of non-random amino acid composition or so-called low complexity regions are suggested to correspond with elongated non-globular protein domains. They are likely to be relatively mobile and especially suitable for conformational adaptibility in interactions within molecular complexes (Wootton, 1994).
Table 1. Segments of non-random amino acid composition in phytochromes. The analysis was performed with the program SEG.
| phytochrome | low complexity regions (sequence position) | low complexity regions (amino acid pattern) | sequence length | ||||||
| PHY1_TOBAC | 2-21 946-957 |
sssrpsqssttsarskhsar ilddtdldsiid |
1124 | ||||||
| PHY3_AVESA | 2-21 343-364 755-766 1087-1098 |
sssrpasssssrnrqssqar neneeddeaeseqpaqqqkkkk iihnpnplippi lsllvsrnllrl |
1129 | ||||||
| PHY4_AVESA | 2-21 343-364 539-549 755-766 1086-1098 |
sssrpasssssrnrqssrar neneeddeaeseqpaqqqqkkk dssdmddsrrm iihnpnplippi glgllvsrkllrl |
1129 | ||||||
| PHYA_ARATH | 13-24 755-766 |
srrsrhsariia iiqnpnplippi |
1122 | ||||||
| PHYA_CUCPE | 2-21 341-352 |
stsrpsqsssnsgrsrhstr vvvnegdeeneg |
1124 | ||||||
| PHYA_MAIZE | 2-23 345-362 757-768 |
sssrpahssssssrtrqssrar neneeddepepeqppqqq iihnpnplippi |
1131 | ||||||
| PHYA_ORYSA | 2-21 342-353 690-700 |
sssrptqcsssssrtrqssr vvvnenedddev keekevkfevk |
1128 | ||||||
| PHYA_PEA | 5-19 39-58 479-491 943-962 |
rpsqssnnsgrsrns sgssfdysssvrvsgsvdgd tdstglstdslsd lskilddsdldgiidgyldl |
1124 | ||||||
| PHYA_SOLTU | 2-16 345-359 |
sssrpsqssttssrs dgdeegessdssqsq |
1123 | ||||||
| PHYA_SOYBN | 2-21 35-52 478-489 947-964 |
stsrpsqsssnsrrsrhsar feesgssfdysssvrvsg dstsfstdslfd qqqlskilddsdldtiid |
1131 | ||||||
| PHYA_PETCR | 2-13 876-887 951-962 |
sssrpansssnp lqlasqelqqal ilddtdldsiid |
1129 | ||||||
| PHYB_ARATH | 2-18 | vsgvggsgggrgggrgg | 1172 | ||||||
| PHYB_ORYSA | 5-34 38-55 835-848 |
sratptrspssarpaaprhqhhhsqssggs agggggggggggggaaaa gevvgkllvgevfg |
1171 | ||||||
| PHYB_SOLTU | 9-20 | hshhsssqaqss | 1129 | ||||||
| PHYB_TOBAC | 10-26 | shqsgqgqvqaqssgts | 1132 | ||||||
| PHYB_SOYBN | 25-34 | shhssnnnnn | 1156 | ||||||
| PHY1_PHYPA | 2-20 | stpkktysstssakskahs | 1132 | ||||||
| PHY1_SELMA | 9-19 136-147 352-358 |
ssgssakskhs aaasalekaaga ggggggg |
1134 | ||||||
| PHYC_ARATH | 2-25 | ssntsrscstrsrqnsrvssqvlv | 1111 | ||||||
| PHYD_ARATH | 1158-1161 | mmmm | 1164 | ||||||
| PHYE_ARATH | ------- | --------- | 1112 | ||||||
| PHY_ADICA | 2-19 36-41 |
sstrhsyssggsgkskhg eesses |
1117 | ||||||
| PHY_CERPU | 2-19 659-675 716-724 |
satkktyssttsakskhs dlvldesvvvverllsl fvvgvffvg |
1303 | ||||||
| PHYE_PHANI | -------- | --------- | 1115 | ||||||
All phytochromes other than phyD and phyE of Arabidopsis have a segment of non-random amino acid composition at the N-terminus between residues 2 and 35 (local sequence numbering). Its length varies between 10 and 35 amino acids depending on the phytochrome species. It appears that this segment contains sequences or residues which are of general importance for the phytochrome function.
Sequence similarities between phytochromes and other proteins
A sequence similarity search of the current databases reveals a strong
similarity of all phytochromes to a hypothetical 84.2 kDa protein from
the cyanobacterium Synechocystis sp. This protein is the
non-phytochrome sequence with the highest score to all phytochromes.
The sequence identity varies between 25% and 60%, depending on the
aligned region. There are further deduced proteins within the genome of Synechocystis
sp. which show differing degrees of similarity to all phytochromes.
Most remarkable are local similarities between the C-terminal domain of
the phytochromes and the sensory transduction histidine kinases. In the
list of the matched sequences they are positioned around the bacterial
two-component protein kinases whose homology to the phytochromes was
already noted in earlier data base searches (Schneider-Poetsch
et al., 1991; Schneider-Poetsch,
1992). These data further strengthen the possibility of an
evolutionary relationship between phytochromes and sensory histidine
kinases, a concept that also gains indirect support from the finding
that the histidine kinase mechanism of the bacterial sensor proteins
obviously plays a central role even in the process of signal
transmission in eukaryotes (Chang,
1996). A complete overview of the similarity search results for all
23 sequences can be found on the IMB Jena
phytochrome web site.
Table 2. Prosite signatures identified by a motif search in phytochromes.
| phytochrome | PROSITE signature | sequence range | sequence pattern |
| PHY1_TOBAC | G_BETA_REPEATS | 632-646 | IFAVDVDGQLNGWNT |
| PHY3_AVESA | ZINC_PROTEASE | 873-882 | VASHELQHAL |
| PHY4_AVESA | ZINC_PROTEASE | 873-882 | VASHELQHAL |
| PHYA_ARATH | G_BETA_REPEATS | 633-647 | ILAVDSDGLVNGWNT |
| PHYA_CUCPE | G_BETA_REPEATS | 631-645 | ILAVDLDGLINGWNT |
| PHYA_MAIZE | - | - | - |
| PHYA_ORYSA | ZINC_PROTEASE | 876-885 | VPSHELQHAL |
| PHYA_PEA | G_BETA_REPEATS | 632-646 | ILAVDVDGTVNGWNI |
| PHYA_SOLTU | G_BETA_REPEATS | 632-646 | IFAVDVDGQVNGWNT |
| PHYA_SOYBN | G_BETA_REPEATS | 639-653 | ILAVDVDGLVNGWNI |
| PHYA_PETCR | G_BETA_REPEATS | 637-651 | IFAVDADEIVNGWNT |
| PHYB_ARATH | DNA_LIGASE_A1 | 571-579 | EDKDDGQRM |
| PHYB_ORYSA | DNA_LIGASE_A1 G_BETA_REPEATS |
580-588 676-690 |
EDKDDGQRM IFAVDTDGCINGWNA |
| PHYB_SOLTU | DNA_LIGASE_A1 | 545-553 | EDKDDGQRM |
| PHYB_TOBAC | DNA_LIGASE_A1 | 547-555 | EDKDDGQRM |
| PHYB_SOYBN | G_BETA_REPEATS AA_TRNA_LIGASE_I |
661-675 812-822 |
IFAVDVDGHVNGWNA PSNENVTVGGV |
| PHY1_PHYPA | G_BETA_REPEATS | 625-639 | ILAVDSNGMINGWNA |
| PHY1_SELMA | - | - | - |
| PHYC_ARATH | - | - | - |
| PHYD_ARATH | DNA_LIGASE_A1 G_BETA_REPEATS |
575-583 671-685 |
EDKDDGQRM IFAVDIDGCINGWNA |
| PHYE_ARATH | ATP_GTP_A | 634-641 | ASEAMGKS |
| PHY_CERPU | AA_TRNA_LIGASE_II_2 PROTEIN_KINASE_ATP PROTEIN_KINASE_ST |
782-791 1010-1031 1123-1135 |
TGSVERLDLY LGSGSSATVEKAVWLGTPVAKK IIHRDLKSMNILV |
| PHYE_PHANI | DNA_LIGASE_A1 | 521-529 | EDKDDGGRM |
The signature of the WD-repeats of the beta-transducin family is one of the most occurring motifs. It is found within the internal repeat of the hinge region in 11 out of the 23 phytochromes. The identification of this motif is remarkable since WD-repeat proteins are known to have regulatory functions and to play a role in protein-protein recognition (Neer et al., 1994).
The signature of the WD-repeats represents a 15 amino acid stretch:
[LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x-x- [DN]-x-x-[LIVMWSTC]x[LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN]
where x stands for any amino acid and one of the amino acids given in square brackets is required for a particular sequence position.
As it can be concluded from the multiple alignment (see below) the amino acids in positions 7 and 15 of the motif are apparently crucial for the lack of this pattern in a few phytochromes, even though they match the remaining part of the motif. It remains open whether this minor deviation from the WD-repeat signature is a problem of motif definition or whether it is of real biological significance.
Another result worth mentioning is the identification of the sequence motif of the ATP-dependent DNA ligase in phytochromes B since recent experimental evidence indicates the nuclear localization of these phytochromes (Sakamoto and Nagatani, 1996).
The documentation of the motifs identified can be found on the IMB Jena phytochrome web site.
The 23 phytochrome sequences retrieved from protein database were subjected to a multiple sequence alignment. Multiple sequence aligments were generated for three different sequence sets:
The results obtained are too extensive to be presented here. The data are available via the IMB Jena phytochrome web site. They can be used as a working tool for future experimental studies on phytochromes. We have used the multiple alignment to deduce the domain structure of phytochromes and to analyze structural differences between phytochromes A and B. Under the assumption that sequence stretches of 10 or more amino acids without conserved patterns indicate domain boundaries, the following sequence regions can be assigned to structural domains:
The sequence regions were derived from the multiple alignment of either all 23 phytochrome sequences or (in parentheses) from the alignment of phytochromes A and B only. This domain structure agrees fairly well with the functionally important domains determined indirectly from proteolytic mapping studies (Lagarias et al., 1985; Grimm et al., 1988). It is also in close correspondence with the structural pattern proposed by Romanowski and Song (1992).
The multiple alignment of the total sequence set shows that phytochromes B have an additional stretch of 15-42 amino acids directly at the N-terminus as compared to phytochromes A. In the remaining part of the sequence phytochromes A and B align in a very similar way. There are, however, positions with totally conserved but different amino acids (Fig. 2). The number of these sequence positions is higher in the N-terminal half as compared to the C-terminal half with a significant peak in the sequence region 150-200. From this result it is tempting to speculate that the latter sequence segment and/or the additional sequence stretch directly at the N-terminus might play a role as determinants of the photosensory specificity of phytochromes A and B.
Fig. 2 Distribution of sequence positions with different totally conserved amino acids in phytochromes A and B (PHY3_AVESA numbering)(PDF format).

Secondary structure prediction
The secondary structure was predicted for phyA from Avena using two alignment based prediction methods, the PHD (Rost et al., 1994; Rost and Sander, 1994) and ZPRED approach (Zvelebil et al., 1987). The predictive ability of these methods in the case of tetrapyrrole-containing proteins was tested by recalculating the secondary structure of C-phycocyanin from Fremyella and comparing the obtained data with those from the X-ray analysis (Duerring et al., 1991; Protein Data Bank code: 1cpc). The estimated secondary structure content (65.9% alpha-helix, 4% beta-sheet, 30.1% loop structure) was in excellent agreement with the X-ray crystallographic structure thus indicating the reliability of the prediction methods applied.
Table 3. Secondary structure content of phyA from Avena estimated by the
PHD and ZPRED methods
-------------------------------------------------------
method secondary structure content
-------------------------------------------------------
alpha-helix beta-sheet loop
-------------------------------------------------------
PHD 44.3 % 13.9 % 41.8 %
ZPRED 48.8 % 17.7 % 33.9 %
-------------------------------------------------------
The analysis of phyA from Avena by the PHD and ZPRED method resulted in very similar structural assignments. The two methods yield a secondary structure content of about 44- 48% alpha-helix, 14-18% beta-sheet and 34-42% loop structure. In addition, they agree fairly well with respect to the locations of the alpha-helix and beta-sheet segments. The results obtained clearly reveal that the secondary structure of phyA from Avena contains primarily alpha-helical segments and in addition a small but significant amount of beta-sheet structure. Even if the dominant structural element is alpha-helix, the secondary structure is not of the all-helix type as in the case of phycocyanin. This finding is confirmed by results obtained from the analysis of the secondary structure of phyA by FTIR spectroscopy (Sühnel et al., 1997). More detailed information on the seondary structure prediction can be obtained from the IMB Jena phytochrome website.