Language Technology, Spring 2013
Exercise 1 (Due Friday 3/22 11:59pm on Blackboard)
--------------------------------------------------

1. Play the Shannon Game at least once (it's fun!), and report your entropy.

a) take a screen shot of your result.
b) work out the formula that was used to calculate your entropy in this game,
   and verify it with your result.
c) write a short paragraph of observations from your game and your result.


2. Derive this central equation in Noisy-Channel model rigorously (see slide 4):

   argmax_{t..t} P(t..t|w..w) 
 ~ argmax_{t_1..t_n} P(t_1)     P(t_2|t_1) P(t_3|t_2 t_1) ... P(t_n|t_{n-1} t_{n-2}) 
                     P(w_1|t_1) P(w_2|t_2) P(w_3|t_3)     ... P(w_n|t_n).

here "~" means "approximately equal". 

In each step of your derivation, it's either an equality or an approximation, and
* in case of equality, annotate the law/rule used (e.g., Bayes rule), 
* in case of approximation, explain the reason/assumption behind it.

Also explain why this model is intuitively called "HMM".

 
3. Run the Python implemention of Viterbi algorithm for bigram tagging from slides.
   You need to work out *two* simple example sentences with many ambiguous words
   (e.g. a can can can a can, or buffalo buffalo ...). 
   Show the output for the two examples.


4. Modify it to do trigram tagging. Modify your examples also. Show the results.

5. (optional) Redo 3 in top-down recursive dynamic programming with memoization.
   Is it faster or slower than 3? Why? 
   If you redo 4 recursively, will it be faster or slower than 4?