BMI 214/CS 274 Programming Project 1
Dynamic Programming Sequence Alignment
Due 11:55PM, Wednesday, April 20, 2005
Introduction
The first programming project is to implement the dynamic
programming algorithm for sequence alignment, parameterized in such a
way that it can be used for either global or local sequence alignment.
You should create a pseudocode description of the algorithm for
yourself first, based on the discussion in the Gribskov
paper. You do not have to implement the caching strategy (described in
the Gotoh article) that takes the complexity from O(N^3) to
O(N^2). The bold student may, for extra credit, attempt to write the
more efficient version, using the scheme reported by Gotoh (see
handout).
Remember, the difference between local alignment and global
alignment are simple:
- Local requires negative scores in the match matrix
- Local never records a score in the score matrix below 0.0
- When looking for best match, local looks for best score anywhere
in matrix (global looks only in last row and last col for best
score).
Program Specifications
The input to the program is a file with lines containing the
following information: (See sample files
align-example.input and
align-example.input-annotated):
- A sequence of letters indicating sequence A
- A sequence of letters indicating sequence B
- An indication of whether local (1) or global (0) alignment is
sought.
- A set of gap penalties for introducing gaps into A or B.
- The symbol alphabets to be used (e.g. ATGC for DNA strands and 21
single-letter abbreviations for the proteins, but there could be other
symbols)
- A matrix which shows the score for a match between an element in B
and one in A.
For simplicity, you could assume that the input sequences will
never be longer than 300 letters and the symbol alphabet size is never
larger than 26.
The output of the program is a file with lines containing the
following information: (See
align-example.output)
- score for the alignment.
- sequence A (with necessary gaps)
- sequence B (with necessary gaps) such that aligned
characters from A and B are in the same column.
When you need to report more than one equivalent alignment, you can
leave a blank line and then print the alignments in a similar
manner. (See align-all-example.output)
To Get Credit:
- Submit your source code and a readme file to CWP under the "Programming
Code Submission" directory.
- Source code: We expect to see comments and
explanations of what is going on. Especially if you expect partial credit
for "almost working" code.
- Readme file: Include
- Your name and CWP login name
- Language used and instruction on how to run your program
- Pseudocode description of the algorithm (one page or less). We
will use this as a primary way to debug the conceptual basis for
your code, especially for partial credit.
- Whether or not you attempted the extra credit
- Take the quiz and provide the test case output at CWP under the
"Assignments and Projects" directory.
- To get extra credit, implement an O(n^2) algorithm. 5% extra points.
CWP
Question? Contact:
bmi214-spr05-staff@lists.stanford.edu