BMI 214/CS 274 Programming Project 1

Dynamic Programming Sequence Alignment

Due 11:55PM, Sunday, April 25, 2004

Introduction

The first programming project is to implement the dynamic programming algorithm for sequence alignment, parameterized in such a way that it can be used for either global or local sequence alignment.

You should create a pseudocode description of the algorithm for yourself first, based on the discussion in the Gribskov paper. You do not have to implement the caching strategy (described in the Gotoh article) that takes the complexity from O(N^3) to O(N^2). The bold student may, for extra credit, attempt to write the more efficient version, using the scheme reported by Gotoh (see handout).

Remember, the difference between local alignment and global alignment are simple:

  1. Local requires negative scores in the match matrix
  2. Local never records a score in the score matrix below 0.0
  3. When looking for best match, local looks for best score anywhere in matrix (global looks only in last row and last col for best score).

Program Specifications

The input to the program is a file with lines containing the following information: (See sample files align-example.input and align-example.input-annotated):

  1. A sequence of letters indicating sequence A
  2. A sequence of letters indicating sequence B
  3. An indication of whether local (1) or global (0) alignment is sought.
  4. A set of gap penalties for introducing gaps into A or B.
  5. The symbol alphabets to be used (e.g. ATGC for DNA strands and 21 single-letter abbreviations for the proteins, but there could be other symbols)
  6. A matrix which shows the score for a match between an element in B and one in A.

For simplicity, you could assume that the input sequences will never be longer than 300 letters and the symbol alphabet size is never larger than 26.

The output of the program is a file with lines containing the following information: (See align-example.output)

  1. score for the alignment.
  2. sequence A (with necessary gaps)
  3. sequence B (with necessary gaps) such that aligned characters from A and B are in the same column.

When you need to report more than one equivalent alignment, you can leave a blank line and then print the alignments in a similar manner. (See align-all-example.output)

To Get Credit:

  1. Make sure you submit the following as well as submit the quiz at CWP.
  2. Submit a readme.txt file which includes:
  3. 1). your name, CWP login, the language you used, instructions on how to run your program 2). your pseudocode description of the algorithm that describes what you coded on one page or less. (We will use this as a primary way to debug the conceptual basis for your code, since we allow so many different languages.)
    3). indication of whether or not you attempted the extra credit
  4. Submit a listing of your entire source code. We expect to see comments and explanations of what is going on. Especially if you expect partial credit for "almost working" code.
  5. Submit the output of the code on test cases and answer questions based on your code.
  6. Submission instructions can be found on CWP.
  7. 5% extra for an O(n^2) algorithm.

CWP Quiz


Question? Contact: biomedin214-spr0304-staff@lists.stanford.edu