MIS214/CS 274 Project 3
Microarray-based Gene Classification

Due at midnight on June 1, 2000

Objectives:

Don’t forget to submit your answers!

Introduction:

Ribosomal Proteins:

The ribosome is a large complex of many proteins that facilitates the translation of mRNA into protein; they are the cellular machinery responsible for linking together the correct sequence of amino acids from a sequence of codons. The proportion of each protein present in the complex is coordinated in the cell to ensure the correct number of subunits are available to construct complete ribosome molecules. The cell often regulates the amount of a protein by controlling the transcription level of the protein's gene. Therefore, we might expect many of the ribosomal genes to be coordinately regulated -- they should have similar mRNA expression levels. Also, given a number of known ribosomal proteins and their expression patterns across a wide range of experiments, we might be able to find other ribosomal proteins by comparison. Other proteins that have similar expression patterns to the known ribosomal genes may also be part of the complex.

(Note: In this assignment, when we refer to ribosomal proteins we mean those proteins present in the ribosome located in the cytoplasm, not those proteins found in the mitochondrial ribosome. It is important to make this distinction since we do not expect the levels of mitochondrial and cytoplasmic ribosomes to be co-regulated.)

Microarray Expression Experiments:

79 microarray experiments were performed where the expression levels of 2467 genes in S. Cervisiae (baker’s yeast) were measured across 79 different conditions or time points. Out of the 2467 genes spotted on the microarrays, 121 were previously characterized as ribosomal genes – the remainder are various other genes with known functions. The 79 experiments include measurements of expression taken after environmental changes were imposed on the yeast. For example, some of these conditions were starvation (causing the yeast to form spores), changing the sugar supply (causing the yeast to ferment rather than respire), and synchronizing the cells to force them to pass through the stages of cell division at the same time. The experiments are described in detail in the paper:

Eisen, M.B., Spellman, P.T., Brown, P.O., Botstein, D. Cluster analysis and display of genome-wide expression patterns. PNAS. 95:14863-14868.

which can be also be found in the course reader. The current list of known ribosomal genes that were used in this homework can be found at the MIPS database. The ribosomal proteins can be found here in the MIPS classification hierarchy.

Data Files:

The expression patterns for the 121 ribosomal genes can be found in the tab-delimited data file ribo.dat. Each column in the file represents one experimental condition (there are 79 columns in all). The expression patterns for the 2346 non-ribosomeal genes can be found in the data file nonribo.dat. The columns in this file are ordered the same way -- with 79 columns corresponding to the same experiments as in the ribosome file. The file experiments.txt lists the experimental conditions and timepoints the columns represent. The *.dat files can be read directly into Matlab with the load() command (i.e. issuing the command load('ribo.dat') will create a 121x79 dimensional matrix called ribo). The common gene name and a short functional annotation for both the ribosomal set and the non-ribosomal set can be found in the files ribo-names.txt and nonribo-names.txt respectively.

Matlab Graphics:

If you are planning to implement KNN in Matlab and want to use it's graphical capabilities (not required to do the homework) and yet you do not have direct access to a leland machine, you have two options: (1) you can download a free evaluation copy of Matlab for your own machine from MathWorks, or (2) you can run an X-server on your own machine and telnet to a leland machine. Microimages makes a free X-server for Macs and has a 15-day evaluation copy for PCs.

To get credit:

  1. Turn in your KNN code.
  2. Answer the questions.

Don’t forget to submit your answers for Project 3!



For questions, please write to Josh or Soumya.