Electronic Supplement to:
 

PRINCIPAL COMPONENTS ANALYSIS TO SUMMARIZE MICROARRAY EXPERIMENTS:
APPLICATION TO SPORULATION TIME SERIES

Soumya Raychaudhuri, Joshua M. Stuart, and Russ B. Altman

Stanford Medical Informatics
Stanford University, 251 Campus Drive, MSOB X-215, Stanford CA 94305-5479
{sxr, stuart, altman} @smi.stanford.edu

The enormous amount of data produced by microarray experiments can be unwieldy. A given series of microarray experiments produces observations of differential expression for thousands of genes across multiple conditions. These large data sets can be summarized with principal components analysis (PCA), a statistical technique that allows the key variables (or combinations of variables) in a multidimensional data set to be identified. Principal components analysis determines those key variables in the data that best explain the differences in the observations. Here we show the utility of applying PCA to expression data, where the experimental conditions are the variables, and the gene expression measurements are the observations. Thus, each component defines a linear combination of the experimental conditions that can be used to distinguish genes parsimoniously. Examination of the components also provides insight into what underlying factors are actually being measured in the experiment. We applied PCA to the publicly released yeast sporulation data set (Chu et al. 1998). In that work, 7 different measurements of gene expression were made over time. PCA on the time-points suggests that much of the observed variability in the experiment can be summarized in just 2 components—i.e. 2 variables capture most of the information. These underlying factors appear to represent (1) overall induction level and (2) change in induction level over time. A visualization of our results is made available (http://www.smi.stanford.edu/projects/helix/PCArray).
 

These links will go to VRML files that will show one small line segment for each gene in the data set in a 2D or 3D plot.  These genes are then hotlinked to the corresponding open reading frame (ORF)  in the Saccharomyces Genome Database.

1. VRML source file with all yeast genes projected onto first two principal component axes.
2. VRML source file with all yeast genes projected onto first three principal component axes.
 

VRML files require a browser plug-in, such as are available at http://home.netscape.com/plugins/3d_and_animation.html.