CS498CXZ Assignment #4: Pairwise Sequence Alignment
(due Oct. 11, 2005, Tuesday, 12:30pm)

  1. [60 points] Consider the sequences v=CGATAAC and w=ACGTTAC. Assume that we score an alignment by awarding a match with"+1" and penalizing a mismatch and indel both with "-1".



  2. [30 points] The BLOSUM scoring matrix assigns a probability score for each position in an alignment that is based on the frequency with which that substitution is known to occur among consensus blocks within related proteins. BLOSUM62 is among the best of the available matrices for detecting weak protein similarities, and is the default matrix used in BLAST 2.0, one of the most popular local sequence alignment tools. The value in a BLOSUM matrix corresponding to amino acids X and Y is computed as
                 p(X,Y)
    val(X,Y)=log ----------
                 p(X) p(Y)
    
    where p(X,Y) is estimated as
              number of times X aligned with Y
    p(X,Y)= -----------------------------------
              total number of aligned pairs
    
    and p(X) and p(Y) can be estimated as
           number of times seeing X in any position of a sequence
    p(X) = -------------------------------------------------------
           total number of positions in all the sequences
    
           number of times seeing Y in any position of a sequence
    p(Y) = -------------------------------------------------------
           total number of positions in all the sequences
    
    A brief note about how to compute a BLOSUM matrix is available here.

  3. [10 points] BLAST is one of the most popular tools for sequence alignment. Biologists often use it to find the proteins/genes that are similar to one they are working on. The purpose of this exercise is to obtain some experience with using BLAST and to understand the meanings of the reported results through searching for protein sequences that match the sickle cell hemoglobin protein sequence. The sickle cell hemoglobin is a mulfunctioning protein due to a change in ONE nucleotide in the DNA sequence that leads to a change in ONE amino acid that changes how the hemoglobin protein folds. This change in the structure of the hemoglobin protein leads to a change in the shape of the red blood cell to a sickle shape, causing diseases as explained in this website .

What to turn in

Please turn in a hardcopy of your written answers at the class