Do's and Don'ts of the Pumping Lemma The Pumping Lemma for regular languages is useful to prove that some languages are NOT regular. So, the first Don't is the following. * DON'T use the pumping lemma to prove something is regular. * DO use it to prove a language is NOT regular. The Pumping Lemma is a bit complicated for the uninitiated. The statement itself is too long to swallow, but is not too difficult to understand, once you get the hang of it. In this note, we focus on the meaning of the theorem and how it can be used. The proof is simple, and is in the book. The pumping lemma says that for any regular language L, there is a "magic number" m such that any string w of length >=m can be written as "xyz" in such a way that (i) |xy| <= m, and |y| >= 1, and (ii) x(y^i)z is in L for any i >= 0. The y is the substring that can be "pumped". To apply the pumping lemma to show that a language is NOT regular, we have to begin with assuming that a language is regular and derive a contradiction. Let us use the example L= {v a v^R: v is in (0+1)* and a is 0 or 1}. Step 1. Assume that L is regular. This means that there exists a magic number m for this language, which we will call the "pumping constant". * DON'T assume a specific m. Your argument must hold for any m. So the strategy is to assume that there is an adversary (opponent) who chooses an m that he claims satisfies the properties of the pumping lemma (i.e. satisfies conditions i and ii for strings longer than m). Your argument must hold for any choice of m of the adversary. Step 2. Consider a specific string w in L which is at least of length m. Since m itself is not known, and you need a string w longer than m, the string you propose should have m as a parameter. In our case, we could pick 0^m00^m, or 1^m11^m, or O^m10^m, or u0u^R, where u is in (0+1)*. But only some choices are good, and some are not. So: * DO choose the string w VERY carefully. In fact, this is the crux of the problem. The simplest and right choice for our example is 0^m10^m. Many other choices don't work, including 0^m00^m. We will see what happens in each of these two cases as we proceed. Step 3. We know that every string w in L, including ours, can be split as xyz such that the conditions (i) and (ii) above hold. But we do not know anything beyond that. In particular, we have no control on how long x and y should be! Only that they should follow the constraints (i) and (ii). So we should let the adversary split w into uvw so that the conditions (i) and (ii) hold. * DON'T pick a particular x,y,z. Just as m, you shouldn't give an argument for a particular x,y,z. The argument should be valid for any choice of x,y,z that obey the two constraints (i) and (ii). The following applies to the correct string 0^m10^m, as well as the wrong string 0^m00^m. ANY choice of x,y,z such that |y| >= 1 and |xy| <= m, must make x and y completely 0's. More over y must have at least one 0 in it. Let X be the length of x and Y be the length of y. The right string w = 0^X 0^Y 0^(m-(X+Y)) 1 0^m. The wrong string w = 0^X 0^Y 0^(m-(X+Y)) 0 0^m. Step 4. * DO pick a specific i > 1 which gives a counter-example to condition (ii) above, and pump the string. Let i = 2. For the right string, (ii) says that 0^X 0^2Y 0^(m-(X+Y)) 1 0^m must be in L. i.e., 0^(m+Y) 1 0^m must be in L. This cannot be, because, since m+Y > m, the mid point of the string is to the left 1. The left half of the string has only 0s, and the right half has a 1. So they can't be reverses of each other. So, * DO show that x(y^i)z is not in L for the specific i you picked. What happens if we pick the wrong string? For i = 2, condition (ii) says that 0^X 0^2Y 0^(m-(X+Y)) 0 0^m must be in L. So 0^{2m+Y+1} must be in L. Since m+Y > m, the mid point has shifted left. The above string must be in L if 2m+1+Y is odd, i.e., if Y is even. Since the choice of Y is the opponent's, he can avoid the contradiction by picking an even Y. So the following point which is worth reiterating. * DON'T assume any particular properties for X and Y other than (i) and (ii). It is the adversary who decides how to split w into x,y, and z. So your argument cannot rely on any specific y or any additional property such as it should be of odd length. But, can't we pick a different i that makes the wrong string still work? You can TRY this. But in this case, as long as Y is even, no i would work, because the resulting string would be of length 2n+1+(i-1)Y, which is always odd if Y is even. Because the adversary chooses Y, we can't fix the wrong choice of the string by changing i. What happens if the contradiction does not work for our string? Does it mean the language is regular? No, of course not. * DON'T conclude that the language is regular if you don't get a contradiction. You can only conclude this if you can prove that there is a regular expression or DFA or NFA that recognizes L. Another bad choice of w is to simply say w = uau^R, where u is any string over {0,1} and a is either 0 or 1. The result of pumping would be some string u'au^R, where u' is longer than u. One might argue that since the l.h.s of "a" have more symbols than the r.h.s. of "a", this is not of the right form to be in L, and hence contradiction. This would be correct if "a" were a terminal symbol and "a" is not allowed to occur in u or u^R. In other words, if there is a unique "a" in the string. But since "a" is either 0 or 1, a could occur both in u and in u^R. So after pumping, the center of the string shifts left from the original position of "a". To show a contradiction, we have to show that the r.h.s. and the l.h.s of the NEW center are not reverses of each other. This is not possible without making more restrictive assumptions about u. That is the point of choosing o^m1o^m, so that there is a unique 1 in the string and it is in the right half of the pumped string. If you can't get a contradiction with your choice of w and i, * DO consider other choices for i, but it is perhaps more useful to try to repair the choice of w.