CS397CXZ Assignment #7:
Protein Sequencing and Phylogenetic Tree Construction
(due Dec. 8, 2005, Thursday, 12:30pm)
- [20 points]
Go to http://prospector.ucsf.edu/,
the UCSF Protein Prospector website, where you can run a suite of software
to do protein/peptide sequencing given a set of mass-to-charge ratio.
Choose the MS-Tag Simple tool, use the default setting and click on
"start search" button. Read the results and answer the
following questions.
- a [5/20 points] What is the ion source and mass analyzer of the mass spectrometry instrument
assumed in the current setting?
- b [3/20 points] How many ion-types are considered in the identification process?
List at least 2 ion-types considered by the software.
- c [4/20 points] What are the identified protein sequence AND name?
- d [8/20 points] Repeat the search, but this time, use dbEST.human.08.26.2003 database instead of SwissProt.2005.01.06 database. This would take much longer time, so you need to
WAIT patiently. Read the results. What are the identified protein sequence AND name?
Are the identified proteins using different databases the same? Why did searching dbEST.human.08.26.2003
take much longer time than searching SwissProt.2005.01.06? (Hint: What is the time complexity of the
database search algorithm?)
- [30 points]
On the planet Final Destiny, amino acids of proteins which exist in nature are A (mass = 31), B (mass=40), C (mass=50), D (mass=57) and E (mass=73). An analytic chemist tries to sequence an unknown peptide brought back from Final Destiny. After running MS/MS mass spectrometry, she got to know that the mass of the whole peptide is 203 and on the mass spectrum there are 4 peaks (50, 57, 90 and 130). Suppose you are to help the chemist find the sequence using de novo sequencing method taught in the lecture.
Assume the charge states of all ions are +1. Also assume there is only 1 ion type and the mass shift of this ion type is 0.
- a [20/30 points] Draw the spectrum graph according to mass spectrum and label all the edges.
- b [10/30 points] Find the most likely sequence of the unknown peptide.
- [50 points] The following is an unrooted tree
that satisfies additivity. That is, the distance between any
two sequences is equal to the sum of the edge lengths of the shortest path
between them.
1 1 1
A ----+------+---C
/ \
/3 \4
/ \
B \
D
- a [10/50 points] Compute and show the pairwise distance matrix based on the additivity property.
- b [20/50 points] Compute and show the normalized distance matrix. From this normalized matrix,
which pair(s) of nodes do we know are neighbors?
- c [20/50 points] Choose a pair of nodes (e.g., X and Y) that have the minimum distance according to the normalized matrix you obtained in b, and assume their parent node is Z. Compute the lengths
of edges XZ and ZY. Show your calculations.
What to turn in
Please turn in a hardcopy of your written answers at the class.