COMPUTER SEQUENCE ANALYSIS OF PHYTOCHROME


J. Sühnel (jsuehnel@imb-jena.de)

Biocomputing,
Institute of Molecular Biotechnology
,
Beutenbergstraße 11, D-07745 Jena

G. Hermann (bgh@pop3.uni-jena.de )

Institute of Biochemistry and Biophysics,
Friedrich Schiller University Jena,
Philosophenweg 12, D-07743 Jena



Introduction

The understanding of the mechanism through which phytochrome exerts its biological action is still hampered by the fact that its three-dimensional structure is not known. Indirect structural information was primarily obtained from biochemical and genetic approaches (for more details see Rüdiger and Thümmler, 1991; Furuya and Song, 1994; Quail et al., 1995; Quail, 1997; Song et al., 1997). In the last few years, methods of computer sequence analysis were also used in order to gain new insights into the structure of phytochrome. The results obtained so far suggest, e.g., a structural domain model (Romanowski and Song, 1992) and a three dimensional atomic model of a short sequence region around the chromophore binding site (Parker et al., 1994). Moreover, they indicate that phytochrome might be a light-regulated protein kinase (Schneider-Poetsch et al., 1991; Schneider-Poetsch, 1992). This proposal was further supported on the basis of striking sequence homologies between phytochrome and a regulatory protein of the chromatic adaptation in cyanobacteria which itself is structurally related to sensor histidine kinases (Kehoe and Grossman, 1996). Since over the past years steadily improved methods of computer sequence analysis have been developed and the databases have considerably increased in size, we have recently applied a repertoire of computer methods to the current databases in order to update the information available from sequence analysis (Sühnel et al., 1997). The results obtained are given in full detail on a phytochrome web site of the IMB Jena World-Wide Web server (http://www.imb-jena.de/PHYTO.html).

A few selected findings are presented here.


Materials and Methods

Twenty-three phytochrome sequences available from protein databases up to January 1997 were used for sequence analysis. The sequences are compiled on the IMB Jena phytochrome web site. Segments of non-random amino acid compositions were identified by the program SEG (Wootton, 1994). Similarity searches were performed using the BLAST interface of SWISS-PROT pointing to the web site at the Ecole Polytechnique Federale de Lausanne (http://expasy.hcuge.ch/cgi-bin/BLASTEPFL.pl) (Altschul et al., 1990). The search for potential motifs was carried out using PROSITE on the ExPASY web server (http://expasy.hcuge.ch/sprot/scnpsite.html). The multiple alignment was calculated by means of the AMPS package and the MULTAL program which is part of the CAMELEON sequence analysis tool of Oxford Molecular Ltd. (Taylor, 1990; Barton, 1994). For the estimation of the secondary structure the PHD method (Rost et al., 1994; Rost and Sander, 1994) and the ZPRED approach (Zvelebil et al., 1987) were applied.


Results and Discussion

Identification of segments with non-random amino acid composition

Segments of non-random amino acid composition or so-called low complexity regions are suggested to correspond with elongated non-globular protein domains. They are likely to be relatively mobile and especially suitable for conformational adaptibility in interactions within molecular complexes (Wootton, 1994).

Table 1. 
Segments of non-random amino acid composition in phytochromes. 
The analysis was performed with the program SEG. 

phytochrome low complexity regions (sequence position) low complexity regions (amino acid pattern) sequence length
PHY1_TOBAC

2-21

946-957

sssrpsqssttsarskhsar

ilddtdldsiid

1124
PHY3_AVESA

2-21

343-364

755-766

1087-1098

sssrpasssssrnrqssqar

neneeddeaeseqpaqqqkkkk

iihnpnplippi

lsllvsrnllrl

1129
PHY4_AVESA

2-21

343-364

539-549

755-766

1086-1098

sssrpasssssrnrqssrar

neneeddeaeseqpaqqqqkkk

dssdmddsrrm

iihnpnplippi

glgllvsrkllrl

1129
PHYA_ARATH

13-24

755-766

srrsrhsariia

iiqnpnplippi

1122
PHYA_CUCPE

2-21

341-352

stsrpsqsssnsgrsrhstr

vvvnegdeeneg

1124
PHYA_MAIZE

2-23

345-362

757-768

sssrpahssssssrtrqssrar

neneeddepepeqppqqq

iihnpnplippi

1131
PHYA_ORYSA

2-21

342-353

690-700

sssrptqcsssssrtrqssr

vvvnenedddev

keekevkfevk

1128
PHYA_PEA

5-19

39-58

479-491

943-962

rpsqssnnsgrsrns

sgssfdysssvrvsgsvdgd

tdstglstdslsd

lskilddsdldgiidgyldl

1124
PHYA_SOLTU

2-16

345-359

sssrpsqssttssrs

dgdeegessdssqsq

1123
PHYA_SOYBN

2-21

35-52

478-489

947-964

stsrpsqsssnsrrsrhsar

feesgssfdysssvrvsg

dstsfstdslfd

qqqlskilddsdldtiid

1131
PHYA_PETCR

2-13

876-887

951-962

sssrpansssnp

lqlasqelqqal

ilddtdldsiid

1129
PHYB_ARATH 2-18 vsgvggsgggrgggrgg 1172
PHYB_ORYSA

5-34

38-55

835-848

sratptrspssarpaaprhqhhhsqssggs

agggggggggggggaaaa

gevvgkllvgevfg

1171
PHYB_SOLTU 9-20 hshhsssqaqss 1129
PHYB_TOBAC 10-26 shqsgqgqvqaqssgts 1132
PHYB_SOYBN 25-34 shhssnnnnn 1156
PHY1_PHYPA 2-20 stpkktysstssakskahs 1132
PHY1_SELMA

9-19

136-147

352-358

ssgssakskhs

aaasalekaaga

ggggggg

1134
PHYC_ARATH 2-25 ssntsrscstrsrqnsrvssqvlv 1111
PHYD_ARATH 1158-1161 mmmm 1164
PHYE_ARATH ------- --------- 1112
PHY_ADICA

2-19

36-41

sstrhsyssggsgkskhg

eesses

1117
PHY_CERPU

2-19

659-675

716-724

satkktyssttsakskhs

dlvldesvvvverllsl

fvvgvffvg

1303
PHYE_PHANI -------- --------- 1115

All phytochromes other than phyD and phyE of Arabidopsis have a segment of non-random amino acid composition at the N-terminus between residues 2 and 35 (local sequence numbering). Its length varies between 10 and 35 amino acids depending on the phytochrome species. It appears that this segment contains sequences or residues which are of general importance for the phytochrome function.

Sequence similarities between phytochromes and other proteins


A sequence similarity search of the current databases reveals a strong similarity of all phytochromes to a hypothetical 84.2 kDa protein from the cyanobacterium Synechocystis sp. This protein is the non-phytochrome sequence with the highest score to all phytochromes. The sequence identity varies between 25% and 60%, depending on the aligned region. There are further deduced proteins within the genome of Synechocystis sp. which show differing degrees of similarity to all phytochromes. Most remarkable are local similarities between the C-terminal domain of the phytochromes and the sensory transduction histidine kinases. In the list of the matched sequences they are positioned around the bacterial two-component protein kinases whose homology to the phytochromes was already noted in earlier data base searches (Schneider-Poetsch et al., 1991; Schneider-Poetsch, 1992). These data further strengthen the possibility of an evolutionary relationship between phytochromes and sensory histidine kinases, a concept that also gains indirect support from the finding that the histidine kinase mechanism of the bacterial sensor proteins obviously plays a central role even in the process of signal transmission in eukaryotes (Chang, 1996). A complete overview of the similarity search results for all 23 sequences can be found on the IMB Jena phytochrome web site.

Motifs


Table 2. 
Prosite signatures identified by a motif search in phytochromes. 

phytochrome PROSITE signature sequence range sequence pattern
PHY1_TOBAC G_BETA_REPEATS 632-646 IFAVDVDGQLNGWNT
PHY3_AVESA ZINC_PROTEASE 873-882 VASHELQHAL
PHY4_AVESA ZINC_PROTEASE 873-882 VASHELQHAL
PHYA_ARATH G_BETA_REPEATS 633-647 ILAVDSDGLVNGWNT
PHYA_CUCPE G_BETA_REPEATS 631-645 ILAVDLDGLINGWNT
PHYA_MAIZE - - -
PHYA_ORYSA ZINC_PROTEASE 876-885 VPSHELQHAL
PHYA_PEA G_BETA_REPEATS 632-646 ILAVDVDGTVNGWNI
PHYA_SOLTU G_BETA_REPEATS 632-646 IFAVDVDGQVNGWNT
PHYA_SOYBN G_BETA_REPEATS 639-653 ILAVDVDGLVNGWNI
PHYA_PETCR G_BETA_REPEATS 637-651 IFAVDADEIVNGWNT
PHYB_ARATH DNA_LIGASE_A1 571-579 EDKDDGQRM
PHYB_ORYSA

DNA_LIGASE_A1

G_BETA_REPEATS

580-588

676-690

EDKDDGQRM

IFAVDTDGCINGWNA

PHYB_SOLTU DNA_LIGASE_A1 545-553 EDKDDGQRM
PHYB_TOBAC DNA_LIGASE_A1 547-555 EDKDDGQRM
PHYB_SOYBN

G_BETA_REPEATS

AA_TRNA_LIGASE_I

661-675

812-822

IFAVDVDGHVNGWNA

PSNENVTVGGV

PHY1_PHYPA G_BETA_REPEATS 625-639 ILAVDSNGMINGWNA
PHY1_SELMA - - -
PHYC_ARATH - - -
PHYD_ARATH

DNA_LIGASE_A1

G_BETA_REPEATS

575-583

671-685

EDKDDGQRM

IFAVDIDGCINGWNA

PHYE_ARATH ATP_GTP_A 634-641 ASEAMGKS
PHY_CERPU

AA_TRNA_LIGASE_II_2

PROTEIN_KINASE_ATP

PROTEIN_KINASE_ST

782-791

1010-1031

1123-1135

TGSVERLDLY

LGSGSSATVEKAVWLGTPVAKK

IIHRDLKSMNILV

PHYE_PHANI DNA_LIGASE_A1 521-529 EDKDDGGRM

The signature of the WD-repeats of the beta-transducin family is one of the most occurring motifs. It is found within the internal repeat of the hinge region in 11 out of the 23 phytochromes. The identification of this motif is remarkable since WD-repeat proteins are known to have regulatory functions and to play a role in protein-protein recognition (Neer et al., 1994).

The signature of the WD-repeats represents a 15 amino acid stretch:

[LIVMSTAC]-[LIVMFYWSTAGC]-[LIMSTAG]-[LIVMSTAGC]-x-x-

[DN]-x-x-[LIVMWSTC]x[LIVMFSTAG]-W-[DEN]-[LIVMFSTAGCN]

where x stands for any amino acid and one of the amino acids given in square brackets is required for a particular sequence position.

As it can be concluded from the multiple alignment (see below) the amino acids in positions 7 and 15 of the motif are apparently crucial for the lack of this pattern in a few phytochromes, even though they match the remaining part of the motif. It remains open whether this minor deviation from the WD-repeat signature is a problem of motif definition or whether it is of real biological significance.

Another result worth mentioning is the identification of the sequence motif of the ATP-dependent DNA ligase in phytochromes B since recent experimental evidence indicates the nuclear localization of these phytochromes (Sakamoto and Nagatani, 1996).

The documentation of the motifs identified can be found on the IMB Jena phytochrome web site.

Multiple sequence alignment

The 23 phytochrome sequences retrieved from protein database were subjected to a multiple sequence alignment. Multiple sequence aligments were generated for three different sequence sets:

The results obtained are too extensive to be presented here. The data are available via the IMB Jena phytochrome web site. They can be used as a working tool for future experimental studies on phytochromes. We have used the multiple alignment to deduce the domain structure of phytochromes and to analyze structural differences between phytochromes A and B. Under the assumption that sequence stretches of 10 or more amino acids without conserved patterns indicate domain boundaries, the following sequence regions can be assigned to structural domains:

The sequence regions were derived from the multiple alignment of either all 23 phytochrome sequences or (in parentheses) from the alignment of phytochromes A and B only. This domain structure agrees fairly well with the functionally important domains determined indirectly from proteolytic mapping studies (Lagarias et al., 1985; Grimm et al., 1988). It is also in close correspondence with the structural pattern proposed by Romanowski and Song (1992).

The multiple alignment of the total sequence set shows that phytochromes B have an additional stretch of 15-42 amino acids directly at the N-terminus as compared to phytochromes A. In the remaining part of the sequence phytochromes A and B align in a very similar way. There are, however, positions with totally conserved but different amino acids (Fig. 2). The number of these sequence positions is higher in the N-terminal half as compared to the C-terminal half with a significant peak in the sequence region 150-200. From this result it is tempting to speculate that the latter sequence segment and/or the additional sequence stretch directly at the N-terminus might play a role as determinants of the photosensory specificity of phytochromes A and B.

Fig. 2 
Distribution of sequence positions with different totally conserved 
amino acids in phytochromes A and B (PHY3_AVESA numbering)(PDF format).

Secondary structure prediction

The secondary structure was predicted for phyA from Avena using two alignment based prediction methods, the PHD (Rost et al., 1994; Rost and Sander, 1994) and ZPRED approach (Zvelebil et al., 1987). The predictive ability of these methods in the case of tetrapyrrole-containing proteins was tested by recalculating the secondary structure of C-phycocyanin from Fremyella and comparing the obtained data with those from the X-ray analysis (Duerring et al., 1991; Protein Data Bank code: 1cpc). The estimated secondary structure content (65.9% alpha-helix, 4% beta-sheet, 30.1% loop structure) was in excellent agreement with the X-ray crystallographic structure thus indicating the reliability of the prediction methods applied.

Table 3. Secondary structure content of phyA from Avena estimated by the 
         PHD and ZPRED methods

-------------------------------------------------------
method              secondary structure content
-------------------------------------------------------
                alpha-helix     beta-sheet     loop
-------------------------------------------------------

PHD                44.3 %         13.9 %       41.8 %
ZPRED              48.8 %         17.7 %       33.9 %
-------------------------------------------------------

The analysis of phyA from Avena by the PHD and ZPRED method resulted in very similar structural assignments. The two methods yield a secondary structure content of about 44- 48% alpha-helix, 14-18% beta-sheet and 34-42% loop structure. In addition, they agree fairly well with respect to the locations of the alpha-helix and beta-sheet segments. The results obtained clearly reveal that the secondary structure of phyA from Avena contains primarily alpha-helical segments and in addition a small but significant amount of beta-sheet structure. Even if the dominant structural element is alpha-helix, the secondary structure is not of the all-helix type as in the case of phycocyanin. This finding is confirmed by results obtained from the analysis of the secondary structure of phyA by FTIR spectroscopy (Sühnel et al., 1997). More detailed information on the seondary structure prediction can be obtained from the IMB Jena phytochrome website.

References