Language Technology, Spring 2013 Exercise 1 (Due Friday 3/22 11:59pm on Blackboard) -------------------------------------------------- 1. Play the Shannon Game at least once (it's fun!), and report your entropy. a) take a screen shot of your result. b) work out the formula that was used to calculate your entropy in this game, and verify it with your result. c) write a short paragraph of observations from your game and your result. 2. Derive this central equation in Noisy-Channel model rigorously (see slide 4): argmax_{t..t} P(t..t|w..w) ~ argmax_{t_1..t_n} P(t_1) P(t_2|t_1) P(t_3|t_2 t_1) ... P(t_n|t_{n-1} t_{n-2}) P(w_1|t_1) P(w_2|t_2) P(w_3|t_3) ... P(w_n|t_n). here "~" means "approximately equal". In each step of your derivation, it's either an equality or an approximation, and * in case of equality, annotate the law/rule used (e.g., Bayes rule), * in case of approximation, explain the reason/assumption behind it. Also explain why this model is intuitively called "HMM". 3. Run the Python implemention of Viterbi algorithm for bigram tagging from slides. You need to work out *two* simple example sentences with many ambiguous words (e.g. a can can can a can, or buffalo buffalo ...). Show the output for the two examples. 4. Modify it to do trigram tagging. Modify your examples also. Show the results. 5. (optional) Redo 3 in top-down recursive dynamic programming with memoization. Is it faster or slower than 3? Why? If you redo 4 recursively, will it be faster or slower than 4?