Protein secondary structure determination.

The most common task a CD spectroscopist is asked for is to evaluate the secondary structure content of a protein. After collecting the spectrum and normalizing data as explained in the preceding section, you are ready to deconvolute your experimental CD.
With this aim, you must:
1.  obtain a proper set of basis (B
k spectra);
2.  perform the data fitting procedure.
To understand how you can do it, first of all consider what is a basis set and how a software can obtain it from the spectra of some reference proteins.
As we already said, a basis set is a set of components chosen in such a way that your experimental CD spectrum can be view as a linear combination of them. If these basis were obtained from a set of reference protein whose structure and CD is known, you can obtain structural information on your protein as well (as we will see).
Please note here the difference between reference spectra and basis spectra (Bk). A protein reference data set is a set of normalized CD spectra (i.e. in mean molar CD/ ellipticity per residue) of protein of known three-dimensional structure. From this reference data set you (or a software) can obtain a set of k basis Bk to use for the deconvolution.

Historically, the first approach was to try to evaluate the B
k components as the spectra of pure secondary structure elements, by collecting the spectra of model peptides or by extracting them from a set of reference proteins. This last option is possible because the secondary structure content of each protein in a reference set is known. Thus you have for each protein the CD as a function of the secondary structure content, so that:

Note that here the Bk basis spectra are reported as alpha, beta, coil while the fk are reported as percentages.
By simple substitution, you (or a software) can obtain the basis spectra (try it!) given that you have a number of protein spectra (CD1, CD2) equal to the number of "pure" secondary structure spectra you want to obtain.
In 1971, Saxena and Wetlaufer used three globular proteins (myoglobine, ribonuclease A and lysozime) to calculate three basis spectra for alpha, beta and "other" structure. With this basis set, they calculate the secondary structure of carboxypeptidase a and alpha-chymotrypsin, reporting a maximum average error of 6% in calculating each fraction of secondary structure.
But at the same time, Chen and Yang used five globular proteins (myoglobin, lysozyme, ribonuclease, papain and lactate dehydrogenase) and found that various combinations of three proteins out of five could give quite different basis spectra. This indicate that basis spectra are highly dependent on the reference proteins, in marked contrast with the simple assumption made in the preceding paragraph. Moreover, there are at least two important complicating factors:
1- some B
k values, e.g. for coil, is hard to determine unequivocally simply due to the poor formal definition of the corresponding secondary structure;
2- even the spectra you are using as reference are affected by errors.
To partially solve the impasse, up to the 80's various attempts to enlarge the reference protein data set and to predict "average" basis were performed, which finally gave good results for alpha helix prediction, but quite disappointing results for beta sheets and turns.