Language Technology, Fall 2014
Exercise 1 - FUN with morphology and entropy
(Due Monday 10/6 11:59pm on Blackboard)
--------------------------------------------------
1. Redo the pluralization transducer from the quiz:
Besides the default +s rule, you'd also consider the following rules:
{-s, -x, -sh, -ch} + es (e.g. buses, boxes, bushes, matches),
-f + -ves (e.g. leaves), and
-y + -ies (e.g. flies).
You don't need to consider any other rules or irregularities (e.g. tooth => teeth).
(a) include a photo of your transducer on paper
(b) include a carmel text file for your transducer, and test it with
at least five examples.
2. Derive this central equation in Noisy-Channel model rigorously (see slide 4):
argmax_{t..t} P(t..t|w..w)
~ argmax_{t_1..t_n} P(t_1) P(t_2|t_1) P(t_3|t_2 t_1) ... P(t_n|t_{n-1} t_{n-2})
P(w_1|t_1) P(w_2|t_2) P(w_3|t_3) ... P(w_n|t_n).
here "~" means "approximately equal".
In each step of your derivation, it's either an equality or an approximation, and
* in case of equality, annotate the law/rule used (e.g., Bayes rule),
* in case of approximation, explain the reason/assumption behind it.
Also explain why this model is intuitively called "HMM".
3. Play the Shannon Game at least once (it's fun!), and report your entropy.
http://www.math.ucsd.edu/~crypto/java/ENTROPY/
(if it doesn't run, System Preference->Java->Security->Edit Site List,
and add http://www.math.ucsd.edu to that list.)
a) take a screen shot of your result.
b) work out the formula that was used to calculate your entropy in this game,
and verify it with your result.
c) write a short paragraph of observations from your game and your result.