Medical Information Sciences 214 (also listed as Computer Science 274)
Representations and Algorithms for Computational Molecular Biology
Spring quarter, 2000
Time: Tues/Thurs 1:30-2:45 lectures.
Location: Thorton,
Room 102
Televised section: Thornton 102, Channel E3, Friday 10:00-10:50
Videotape: Located in Terman
Library
Internet: MIS
214 Course by streamed internet video online on Stanford
Online
General Course Information
Course Reader: Will be available at Stanford
Bookstore sometime during the course, contents of last year's reader
are summarized here.
Schedule
of Lectures (HTML table)
Assignments and Projects
Check
your grades here
Assignment 0:
Class survey. DUE at NOON on MONDAY, April 3.
Assignment 1:
Surfing the Web for Biological Data, DUE at NOON on WEDNESDAY April
5.
Project 1: Dynamic Programming.DUE
at 11:59 on MONDAY April 17.
Assignment 2: Sequence Analysis.
DUE at the beginning of class THURSDAY April 13.
Assignment 3: Hidden
Markov Models. DUE at 11:59pm on THURSDAY April 27.
3-hour Midterm Exam: DUE at
noon on WEDNESDAY, May 3.
Assignment 4: Gene Finding
& Functional Annotation. DUE at 11:59pm on THURSDAY May 11.
Project 2: Threading
and Distances. DUE at midnight on THURSDAY May 18.
Project 3: Microarray-based
Gene Classification. DUE at midnight on THURSDAY
June 1. (OK to hand in 6/6 w/o using late days)
Final Exam: (Same format as Midterm) DUE at midnight on TUESDAY June
6. (Can't be late!).
Final Course Evaluation: DUE at Midnight on WEDNESDAY June 7.
Lecture Notes and Suggested Readings
March 28 (ALTMAN): Introduction to Bioinformatics & Computational
Biology
-
Lecture Slides:
-
Databases mentioned in lecture: Genbank,
Swiss-Prot, PDB,
Medline
-
Introductory Biology Resources
-
The field of bioinformatics
-
Course reader (optional): Smith, 1990
-
March 30 (ALTMAN): An informatics view of biological structure and
function
-
Lectures Slides:
-
Visible
Human Project
-
Introduction
to Membrane Structure
-
Some gene/protein function taxonomies:
-
Kinemages for biological structure (need MAGE
software, here
for Mac)
-
DNA
kinemage
-
Protein
backbone kinemage
-
Betasheet
kinemage
-
Kinemages from Lecture: proteinbbone.kin,
4hhbb.kin,
helix.kin,
4hhb.kin,
HBlesson.kin,
protour1.kin,
pdb1rbp2.kin,protour2.kin,
protour4.kin,
protour5.kin,
dna.kin,
NATour1.kin
-
Beta-hemoglobin
record in genome database (graphical view)
-
Human
myoglobin gene in GenBank
-
UCSF Computer
Graphics Lab, Scripps
Institute Olson Lab
-
Amino acid
properties web page
-
March 31 (Section, J. CHANG). Python Tutorial
-
April 4 (ALTMAN): Pairwise sequence alignment
-
April 6 (ALTMAN): More on sequence alignment, basic structural computations
-
Lecture slides: (color,
2/page), (b&w,
3/page)
-
Course reader (optional): Altschul, 1991; Altschul
et al, 1990, Dayhoff et al, 1978, Karlin & Altschul, 1990
-
April 11 (ALTMAN): Protein energetics and protein folding, begin
multiple sequence alignment
-
April 13 (KLEIN): Visualization in Molecular Biology
-
April 14 (RAYCHAUDHURI): Perl Programming Tutorial
-
Link to CS193i Perl tutorial here.
-
April 18 (ALTMAN): Multiple Sequence Alignments, Markov Models, Hidden
Markov Models
-
Lecture slides: (color,
2/page), (b&w,
3/page)
-
Reference: Biological
Sequence Analysis, R. Durbin, S. Eddy, A. Krogh, G. Mitchison, 1998,
Cambridge University Press
-
HMM software from Sean Eddy lab.
-
HMM software
from UC Santa Cruz
-
MEME
tool for creating motifs
-
Gibbs Sampling lecture
from Washington U. St. Louis
-
Web page for Chip Lawrence
laboratory (including alignment code)
-
Web page for Jun Liu laboratory
-
Course reader (optional): Krogh et al, 1994;
McClure et al, 1994; Thompson et al, 1994, Lipman et al, 1989
-
April 20 (ALTMAN): A Bit More on Hidden Markov Models
-
Lecture slides: (color,
2/page), (b&w,
3/page)
-
Reference: Proceedings
of the Pacific Symposium on Biocomputing, 2000, R.B. Altman, A.K. Dunker,
L. Hunter, K. Lauderdale, and T.E. Klein, 2000, World Scientific Publishing.
-
CASP web site for protein structure
prediction
-
SCOP Server
-
EMBL
protein structure prediction server
-
UCLA protein
structure prediction server
-
Modeller
Program for structure prediction web site
-
Guide to protein
structure prediction for biologists
-
April 25 (ALTMAN): Protein Structure Prediction and Threading
-
April 27 (ALTMAN): Protein Structure Prediction and Threading
-
April 28 (RAYCHAUDHURI): Gibb's Sampling for Sequence Alignment
-
May 2 (KOZA): Genetic Algorithms
-
Lecture notes: 1/page
-
Course reader (optional): Pederson & Moult,
1997;
-
May 4 (KOZA): Genetic Programming
-
Same notes as previous lecture (continued)
-
Course reader (optional): Koza, 1994
-
Genetic
Programming Conference
-
DNA Computing references:
-
Adleman, Leonard M. Molecular computation of solutions
to combinatorial problems. Science 266:1021-1023. November 11, 1994
-
Lipton, Richard J. DNA solution of hard computational
problems. Science 268:542-545. April 28, 1995.
-
Paun, Gheorghe, Rozenberg, Grzegorz, and Salomaa,
Arto. 1998. DNA Computing: New Computing Paradigms. Berlin: Springer Verlag.
-
http://dope.caltech.edu/winfree/DNA.html
-
May 9 (ALTMAN): Structural Alignment
-
May 11 (ALTMAN): Evolutionary Trees
-
May 16 (RAYCHAUDHURI):
-
May 18 (ALTMAN): Genetic Networks (Revisited)
-
May 19 (STUART): Matlab Tutorial
-
Main example file: main.m
-
K-means clustering function: kmeans.m
-
Hemoglobin molecule (all atoms and xyz coordinates only): 4hhb.dat
-
Hemoglobin molecule (alpha carbons in PDB-like format): 4hhb.pdb
-
The University of Florida seemed to have a good introductory level Matlab
tutorial: univ
of florida
-
Search google.com for "matlab
tutorial"
-
May 23 (ALTMAN): Non Atomic Representations of Molecular Structure:
3D Motifs
-
May 25 (ALTMAN): Visualization in Molecular Biology
-
Lecture slides: (color,
2/page), (b&w,
3/page)
-
Unified Medical Language
System
-
Medical Entity Subject
Heading (MESH) Browser
-
List
of words distinguishing malaria literature from yeast literature
-
Documentation for PROSITE database
(PRODOC)
-
NATURAL LANGUAGE PROCESSING FOR BIOLOGY
-
Session
Introduction
T. Tsunoda and L. Wong; Pacific Symposium on Biocomputing 5:488-489
(2000).
-
Knowledge
Representation and Indexing Using the Unified Medical Language System
K. Baclawski, J. Cigna, M.M. Kokar, P. Mager, and B. Indurkhya;
Pacific Symposium on Biocomputing 5:490-501 (2000).
-
Two
Applications of Information Extraction to Biological Science Journal Articles:
Enzyme Interactions and Protein Structures
K. Humphreys, G. Demetriou and R. Gaizauskas; Pacific Symposium
on Biocomputing 5:502-513 (2000).
-
EDGAR:
Extraction of Drugs, Genes and Relations from the Biomedical Literature
T.C. Rindflesch, Lorraine Tanabe, John N. Weinstein, and L. Hunter;
Pacific Symposium on Biocomputing 5:514-525 (2000).
-
Biobibliometrics:
Information Retrieval and Visualization from Co-Occurrences of Gene Names
in Medline Abstracts
B.J. Stapley and G. Benoit; Pacific Symposium on Biocomputing
5:526-537 (2000).
-
Automatic
Extraction of Protein Interactions from Scientific Abstracts
J. Thomas, D. Milward, C. Ouzounis, S. Pulman, and M. Carroll;
Pacific Symposium on Biocomputing 5:538-549 (2000).
-
May 30 (ALTMAN): Computing with Distances and Final Thoughts
Other Links For Course.
General Course Information
Instructors:
Russ Altman,
Associate Professor of Medicine (and Computer Science, by courtesy), Stanford
Medical Informatics. MSOB X-215, Stanford, Mail Code 5479. 650-725-3394,
altman@smi.stanford.edu
John Koza, Consulting
Professor (Medical Informatics), Stanford Medical Informatics, MSOB X-215,
Stanford, Mail Code 5479. 650-941-0336, koza@smi.stanford.edu
Teaching Assistants:
Josh Stuart, stuart@smi.stanford.edu, 650-725-3398
Office Hours: Monday and Friday 4-5, at MSOB
215.
Soumya Raychaudhuri, sxr@smi.stanford.edu, 650-725-3398
Office Hours: Monday and Friday 4-5, at MSOB
215.
Course Coordinator: Kevin
Lauderdale, MSOB X215, (650) 725-0659, kxl@smi.stanford.edu
Description:
This course will introduce the basic computational issues and methods
used in molecular biology, combining core lectures, programming assignments,
with midterm and final. The course will introduce and use biological data
sources available on the world wide web media. Topics will include basic
algorithms for alignment of biological sequences and structures, as well
as more advanced representational and algorithmic issues in structure and
sequence computation. These include, for example, dynamic programming algorithms
for alignment, structural superposition algorithms, computing with distance
information, 3D motif definition and computation, hidden Markov models,
phylogenetic trees, statistical feature detection, genetic algorithms,
design of data resources, automated analysis of biological literature,
database integration, and collaborative environments for supporting biology.
We will assume no previous biology background. We will assume an
interest
in biology, however.
Units:
-
This course is normally taken for 4 units.
-
It can be taken for 3 units by arrangement with instructors.
-
A lecture-only, no assignment participation may be taken for 1 unit by
arrangement with instructors. Students must attend all lectures,
absences must be approved by the instructors.
Grading: The course will be graded by performance on short homeworks
(approximately 30%), long projects (approximately 50%), midterm and final
(approximately 20%, both take home, open book).
Late policy: All projects, assignments and exams should
be submitted electronically by the specified time due (Pacific Standard
Time). Each student is granted 7 "free" late days that can be used
as extensions for any project, assignment or exam (exceptions: Midterm
Exam can have a max of 3 late days, Final Exam can have a max of 0 late
days). Late days will be measured in 24-hour/day calendar days with
no distinction for weekends or holidays, and will be rounded UP to the
nearest integer (thus, 10 minutes late = 23 hours late = 1 day late). After
you use up all your free days, your grade on late projects/assignments/exams
will be reduced 10% for each late day. Extensions beyond the
7 free days may be granted at the discretion of the instructor (not the
TAs) but must be requested prior to the due date.
Auditors: Must be approved by Dr. Altman.
Prerequisites: Previous exposure to matrix mathematics and programming
skills required. Familiarity with biology helpful, but not required. The
CS requirement is meant to ensure that people can write computer programs,
and understand the basics of data structures and algorithms. The math requirement
is meant to ensure that people feel comfortable with matrix algebra.
Students may choose any programming language, but we strongly recommend
considering a high level prototyping language for speed of implementation,
such as Python, PERL, or others.
Computer resources: You will need to have access to email and
the web to access assignments. All of these resources are available to
Stanford students at Sweet Hall and elsewhere. Most course material will
be placed on the WWW in *.pdf (Adobe Acrobat) format, which allows the
documents to be read on multiple platforms. Readers are available for free
for Windows, Macintosh and many unix platforms at the
Adobe website.
Course readings: Will be distributed as needed in class, or through
the course coordinator. A course reader is being prepared (ready 2nd or
3rd week of course).
Updated virtually continuously by Russ
Altman...Thanks to Lee Kozar for background graphic.