CS498CXZ Assignment #3: Gene Prediction
(due Sept. 29, 2005, Thursday, 12:30pm)

  1. [5 points] A single nucleotide substitution at which position in a codon would most likely have the greatest impact on the function of the encoded protein: the first, the second, or the third? Why?
  2. [5 points] Which of the following of point mutations would most likely have the greatest impact on the function of the encoded protein: a single nucleotide substitution mutation (i.e. A mutates to G) or a single nucleotide deletion (i.e. A is deleted from the sequence)? Why?
  3. [30 points] Although the genetic code is universal, organisms usually have their own preference for codon usage. For example, the web site http://www.molbiol.ox.ac.uk/~cocallag/refdata_html/codonusagetable.shtml gives statistics on the codon usage of Escherichia coli.(Click here to see an explanation of the table.) Your colleague has an EST fragment from E. coli with the following sequence: AAGUCAUUAUUUUCG. Assuming this is a coding strand, can you help her to identify the most likely translation frame, i.e., how we should segment the sequence to translate it into a protein sequence? Show your calculations.
  4. [60 points] Use any of your favorite programming languages to implement the exon chaining algorithm listed in section 6.13 of the textbook. The input to the algorithm should be a list of weighted intervals (see an example of input), and the output should be the path of intervals with maximum score (see an example of output). Test your program with the exon chaining problem in Figure 6.26 and print out the output of your program as the written answer for this question; please ignore the zero weight interval [13,14], so you have 8 intervals to work with.

What to turn in

Please turn in a hardcopy of your written answers at the class and send your code to our Grader/TA Xuehua Shen by email (xshen AT cs.uiuc.edu).