- big O notation: Read Section 2.8 of the textbook if you don't understand it.
- Probability distributions: Binomial, Gaussian, and Multinomial. Bayes rule. If you don't know these concepts, read any textbook on probability and statistics.
- Entropy, Kullback-Leibler divergence, and mutual information. Given discrete random variables X and Y, you need to know how to compute H(X), D(X||Y) and
I(X;Y). Refer to the lecture slides for the formulas.
- Definition of the shortest superstring problem. If you don't know, read
section 8.4.
- Relation of the shortest superstring problem with the Hamiltonian Path problem and the Eulerian Path problem in graph algorithms. (Read 8.7 and 8.8 if you don't know.) Know how to find an Eulerian path from a graph.
- Basic idea of whole genome sequencing (i.e., how reading a limited number of base pairs (500-700) can still allow us to infer the whole genome). Know why repeats cause problems.
Know the Lander-Waterman Model. Know the 3 steps involved in the Overlap-Layout-Consensus approach
(i.e., know what each step does conceptually). Refer to the lecture slides on these topics.
- The Exon Chaining problem and how it helps solve the problem of gene prediction. Know the dynamic programming algorithm for solving the Excon Chaining problem . (Read section 6.13 if you don't know.)
- Start codons, stop codons, and open reading frames. Splicing signal. Know how the likelihood ratio
method can be used to perform a binary classification (e.g., deciding whether a candidate region
is a true exon). Refer to the lecture slides and read section 6.12.
- How alignments correspond to paths in an edit graph for pairwise sequence alignment. Know how to
identify a path given an alignment and vice versa. Know how BLOSUM matrix is constructed. Know affine gap penalty scoring. Know the dynamic programming equations for both global alignment and local alignment. Know how to use the dynamic programming equations to fill in the values in an edit graph for both global alignment and local alignment.
Read sections 6.4-6.8.