Solution to Assignment 1  (CS410)

 

1. Probabilistic Reasoning and Bayes Rule [30 points]

A)

=0.9882

Similarly, we can get P(V=0|A=1,L=0,K=0)=0.0118.

Since P(V=1|A=1,L=0,K=0)> P(V=0|A=1,L=0,K=0), we conclude that message M carries a virus.

 

Grading criterion:

 

B) P(V=1)+P(V=0)=1.

 

Grading criterion: 5 points for the correct constraint

 

C) There are many ways to do this, for example assign p(A=1|V=1) to 0.01.

 

Grading Criterion: 4 points for the correct answer, 1 point for details

 

D) When V=1, there are 8 combinations of values of A, K, L, because A, K and L are all binary random variable. Moreover, we know . So, we only need to specify 7 values.  Similarly, we need to specify 7 values when V=0. Totally, we need to specify 14 probability values.

 

Grading Criterion: 5 points for correct answer; 3 points for partially correct answers

 

E) For example, given a message with a virus, the length of the message might be strongly related to whether it contains an attachment.

 

Grading Criterion:  3 points for mentioning conditional dependency; 2 points for correct example. If you only explain they are not independent instead of not conditional independent, 3 points will be deducted.

 

2. Maximum Likelihood Estimation [15 points]

 

 

Grading Criterion: 15 points for the correct derivation. 1 point is deducted if there is any minor error. 5 points is deducted if you did not get log likelihood.

 

3. Getting familiar with text [20 points]

 

Grading Criterion: 20 points for any reasonable explanation

 

4. Unigram language model and smoothing [35 points]

 

Entropy:

 

 

 

 

Cross Entropy:

 

 

 

 


KL:

 

 


  

 

Log-likelihood:

         

Grading Criterion: