One of the most basic commonalities shared by each member in a family of structures is a set of atoms which occupy the same relative positions in space. Our focus here is in identifying this set of atoms, and then in characterizing it statistically. We show how to construct an average core structure from a protein family in such a way that the average is unbiased and the resulting structure has acceptable stereochemistry. We then show how this core structure can be used to characterize the structural variability within a family, to define the average relative orientation of domains in multi-domain complexes, and to develop new measures of similarity between members of the same structural family. We illustrate our ideas through application to the two archetypal protein families: the all a-helical globins and the all b-sheet immunoglobulins. For both families we find an average structural core that is biologically relevant. Subsequently, we use the globin core to illustrate our calculation of a better RMS and to highlight the differences between family classifications based on structure and those based on sequence. We use the immunoglobulin core to show how our procedure can be adapted to dealing with large assemblies.
Our method for defining regions of low structural variation is also useful for the analysis of structures solved NMR spectroscopy and generated by molecular dynamics since both techniques produce an ensemble of structures in a sense, a family of very similar structures. We have applied our algorithm to the NMR structures reported for trypsin inhibitor and examined the distribution the actual atoms about the core structure. The representations used by our algorithm were initially developed for calculating structure from NMR data.
We have published a paper in J. Mol. Graphics describing the program
proteanD, which is designed to display various representations of
structural uncertainty for macromolecules, including overlapping stick
drawings, ellipsoids of uncertainty, and secondary structure accessible
volumes. These representations are closely related to our computational
methodology for computing cores. ProteanD runs on Silicon Graphics (SGI)
machines. Sample input files and binary executable code are available in
ftp://helix-ftp.stanford.edu/people/altman/proteand.tar.Z.