Defining Average Core Structures

This project, supported by NIH NLM-05652, is part of the Helix Group at Stanford School of Medicine. Please address inquiries to russ.altman@stanford.edu.


I. SUMMARY OF PROJECT GOALS

Proteins have long been clustered into families, such as the globins or the immunoglobulins. Members of protein families tend to have similar overall folds but differences in their detailed structure. The classification of the entire Protein Data Bank using protein families has recently been attempted by a number of groups. Moreover, the number of protein structures in the whole data bank and in some families, in particular, is now quite large and is rapidly increasing. A recent estimate puts the total number of chains at 3000, increasing by about one a day, and in the Protein Data bank there are now over 15 distinctly different globin structures and over 20 different immunoglobulin structures. Consequently, it has become desirable (even necessary) to summarize the common structural features within a family, whilst separating out the variable ones.

One of the most basic commonalities shared by each member in a family of structures is a set of atoms which occupy the same relative positions in space. Our focus here is in identifying this set of atoms, and then in characterizing it statistically. We show how to construct an average core structure from a protein family in such a way that the average is unbiased and the resulting structure has acceptable stereochemistry. We then show how this core structure can be used to characterize the structural variability within a family, to define the average relative orientation of domains in multi-domain complexes, and to develop new measures of similarity between members of the same structural family. We illustrate our ideas through application to the two archetypal protein families: the all a-helical globins and the all b-sheet immunoglobulins. For both families we find an average structural core that is biologically relevant. Subsequently, we use the globin core to illustrate our calculation of a better RMS and to highlight the differences between family classifications based on structure and those based on sequence. We use the immunoglobulin core to show how our procedure can be adapted to dealing with large assemblies.

Our method for defining regions of low structural variation is also useful for the analysis of structures solved NMR spectroscopy and generated by molecular dynamics since both techniques produce an ensemble of structures in a sense, a family of very similar structures. We have applied our algorithm to the NMR structures reported for trypsin inhibitor and examined the distribution the actual atoms about the core structure. The representations used by our algorithm were initially developed for calculating structure from NMR data. 


II. PROJECT PERSONNEL


III. SHARED DATA, IMAGES, AND SOFTWARE

IV. REFERENCES.



Helix Research at Stanford School of Medicine
Contact: russ.altman@stanford.edu
Updated March 10, 2004.