About the Final Exam of CS498CXZ (Fall 2005)
1. Time and place
The final will be on Dec 13, Tuesday, 7-10pm,
in our regular classroom -- Siebel Center 1302.
2. Format
The final will be a 3-hour closed-book examination.
You are not allowed to bring any written material.
It may be slightly advantageous to bring a calculator
as there will be some calculations.
3. Coverage
The test will cover all the lectures but with a clear emphasis
on the topics taught after the midterm. A majority of points
will be on the following key algorithms.
Algorithms with a star
are those for which you must know precisely all the details.
For other algorithms, you must know the main ideas.
The format of the questions will be similar to the midterm questions
and homework questions.
Before-midterm topics
- big O notation: Read Section 2.8 of the textbook if you don't understand it.
- Probability distributions: Binomial, Gaussian, and Multinomial. Bayes rule. If you don't know these concepts, read any textbook on probability and statistics.
- Entropy, Kullback-Leibler divergence, and mutual information. Given discrete random variables X and Y, you need to know how to compute H(X), D(X||Y) and
I(X;Y). Refer to the lecture slides for the formulas.
- Definition of the shortest superstring problem. If you don't know, read
section 8.4.
- Relation of the shortest superstring problem with the Hamiltonian Path problem and the Eulerian Path problem in graph algorithms. (Read 8.7 and 8.8 if you don't know.) Know how to find an Eulerian path from a graph.
- Basic idea of whole genome sequencing (i.e., how reading a limited number of base pairs (500-700) can still allow us to infer the whole genome). Know why repeats cause problems.
Know the Lander-Waterman Model. Know the 3 steps involved in the Overlap-Layout-Consensus approach
(i.e., know what each step does conceptually). Refer to the lecture slides on these topics.
- The Exon Chaining problem and how it helps solve the problem of gene prediction. Know the dynamic programming algorithm for solving the Excon Chaining problem . (Read section 6.13 if you don't know.)
- Start codons, stop codons, and open reading frames. Splicing signal. Know how the likelihood ratio
method can be used to perform a binary classification (e.g., deciding whether a candidate region
is a true exon). Refer to the lecture slides and read section 6.12.
- How alignments correspond to paths in an edit graph for pairwise sequence alignment. Know how to
identify a path given an alignment and vice versa. Know how BLOSUM matrix is constructed. Know affine gap penalty scoring. Know the dynamic programming equations for both global alignment and local alignment. Know how to use the dynamic programming equations to fill in the values in an edit graph for both global alignment and local alignment.
Read sections 6.4-6.8.
After-midterm topics
- Know what is microarray data and why clustering of microarray data is interesting from biology perspective. Know how k-means and agglomerative hierarchical clustering algorithms work. Know the differences between single-link, average-link, and complete-link. Read sections 10.1-10.3.
- Know what is a regulatory motif and why it is important to study from biology perspective. Know what is a position weight matrix (PWM) and how to compute a PWM from a set of aligned sequences. Know how the CONSENSUS motif finding algorithm works and how to compute information content. Read sections 4.4-4.9 and 5.5.
- Know the definition of a hidden Markov model (HMM) and the meanings of its parameters. Know how the Viterbi decoding algorithm works and its applications. Know how the Forward and Backward algorithms work. Know how to estimate the parameters of an HMM using annotated data (i.e., in a supervised way). Read sections 11.1-11.4.
- Know the high-level ideas of applying HMMs to protein classification,
gene finding, multiple sequence alignment, and motif finding. Know what is the problem of local maxima. Know what is a Profile HMM and its typical architecture. Know how HMMs can be constructed appropriately to model matching, insertion, and deletion of amino acids. Know how to estimate a profile HMM based on multiple sequence alignment. Read section 11.5.
- Know what is Mass Spectrum data and the meaning of a peak. Know
how the De Novo Sequencing algorithm works. Read sections 8.10-8.12.
- Know what is a phylogenetic tree (rooted and unrooted). Know how
the neighbor-joining algorithm works. Know the basic idea of maximum parsimony. Read sections 10.5-10.10.
- Know what is genome rearrangement. Know how can the genome rearrangement problem can be formulated as a problem of sorting by reversal.
Know how the breakpoint-based greedy algorithm works.