BMI 214/CS 274 Programming Project 1

Dynamic Programming Sequence Alignment

Due 11:55PM, Wednesday, April 20, 2005

Introduction

The first programming project is to implement the dynamic programming algorithm for sequence alignment, parameterized in such a way that it can be used for either global or local sequence alignment.

You should create a pseudocode description of the algorithm for yourself first, based on the discussion in the Gribskov paper. You do not have to implement the caching strategy (described in the Gotoh article) that takes the complexity from O(N^3) to O(N^2). The bold student may, for extra credit, attempt to write the more efficient version, using the scheme reported by Gotoh (see handout).

Remember, the difference between local alignment and global alignment are simple:

  1. Local requires negative scores in the match matrix
  2. Local never records a score in the score matrix below 0.0
  3. When looking for best match, local looks for best score anywhere in matrix (global looks only in last row and last col for best score).

Program Specifications

The input to the program is a file with lines containing the following information: (See sample files align-example.input and align-example.input-annotated):

  1. A sequence of letters indicating sequence A
  2. A sequence of letters indicating sequence B
  3. An indication of whether local (1) or global (0) alignment is sought.
  4. A set of gap penalties for introducing gaps into A or B.
  5. The symbol alphabets to be used (e.g. ATGC for DNA strands and 21 single-letter abbreviations for the proteins, but there could be other symbols)
  6. A matrix which shows the score for a match between an element in B and one in A.

For simplicity, you could assume that the input sequences will never be longer than 300 letters and the symbol alphabet size is never larger than 26.

The output of the program is a file with lines containing the following information: (See align-example.output)

  1. score for the alignment.
  2. sequence A (with necessary gaps)
  3. sequence B (with necessary gaps) such that aligned characters from A and B are in the same column.

When you need to report more than one equivalent alignment, you can leave a blank line and then print the alignments in a similar manner. (See align-all-example.output)

To Get Credit:

  1. Submit your source code and a readme file to CWP under the "Programming Code Submission" directory.
    1. Source code: We expect to see comments and explanations of what is going on. Especially if you expect partial credit for "almost working" code.
    2. Readme file: Include
      • Your name and CWP login name
      • Language used and instruction on how to run your program
      • Pseudocode description of the algorithm (one page or less). We will use this as a primary way to debug the conceptual basis for your code, especially for partial credit.
      • Whether or not you attempted the extra credit
  2. Take the quiz and provide the test case output at CWP under the "Assignments and Projects" directory.
  3. To get extra credit, implement an O(n^2) algorithm. 5% extra points.

CWP


Question? Contact: bmi214-spr05-staff@lists.stanford.edu