CFL (Context-Free Languages)

 

Context-Free Grammars:

  A production of the form:  N ->  w

                   where N must be exactly one non-terminal symbol (and nothing

                   else) and  w  consists of zero or more grammar symbols (possibly

                   zero symbol)

 

Example:    CFG1

 

          ‘S’ is the Start Symbol.

          S -> S R

          S -> R

          R -> ( )

          R -> ( S )

 

          We call a production-rule “Empty” if its RHS is an empty-string.

          Above CFG has no such production rule, namely, there is no empty

          production rule.

 

          Consider the following CFG:

 

          S -> S R                                   S ->  R P

          S -> R                                      S ->  R

          R -> ( )                                     P ->  S

          R -> ( P )                                 P ->  l

          P ->  S                                     R -> ( )

          P ->  l                                      R -> ( S )

 

          Each of above two CF grammars generates the same language as the

          above grammar, CFG1, does but they each has an empty production rule.

 

Some Useful Features of CFG.

          Ex:    E ->  E + T  |  T

                   T ->  T  * F |   F

                   F  ->  id   |   ( E )

          Above grammar generates numerical expressions such as:

                             Id

                             Id + id

                             Id + id * id

                             (id + id) *  id

                             (((id)))

                             ((id+id)*id)*(id+id) and etc.

          The grammar specifies, among others:

1.     An expression can have any number of + and *

2.     * has higher precedence over +

3.     Parentheses have higher precedence over * and +

4.     * and + each is a left-associated binary operator.

In addition, CFG can specify

a.     A certain language feature must appear before some other features like declaration section appearing before execution section.

b.     A certain specification can repeat any number of times.

 

Elimination of Empty Production rules from a CFG.

 

Unless the Start Symbol can derive the empty-string, each empty production can be eliminated from a CFG without altering the language defined by the grammar. And, if the Start Symbol can derive the empty string then the only empty production we are unable to eliminate from a CFG must be the one for that Start Symbol which we can consider to be a special case of the empty rule.

 

          Example: CFG2

          S -> S P

          S -> Q P Q

          P -> ( )

          P -> ( S )

          Q -> R

          R -> S R

          R - > l

 

          In the above grammar, we say that ‘R’ and ‘Q’ are each “Nullable

          as each of them can derive the empty string as the former does it directly by

          itself while the latter does it indirectly by way of ‘R’.   

          But, other non-terminals are not nullable.

          After eliminating the only one empty rule of the grammar, we may

          obtain the following equivalent grammar:

 

          S -> S P

          S -> Q P Q

          S -> Q P

          S -> P Q

          S -> P

          P -> ( )

          P -> ( S )

          Q -> R

          R -> S R

          R -> S

 

          Note that ‘S’ is not nullable in the above CFG2, so the resulting CFG

          has no empty rule at all,  not even ‘S ->  l  

          Also, note that we can systematically eliminate all “Unit” production

          rules of the form: N ->  RHS where RHS string consists only of one

          non-terminal and no terminal such as “R -> S” and “Q -> R”

 

 

Definition: CF Languages.

 

A language generated by any CFG is Context-Free.

 

Two normal forms:

 

1. Chomsky Normal Form

          Each rule is of the form except possibly for “S -> l “:

                   N -> P Q               // RHS has exactly two non-terminals

                   N -> t                             // RHS has just one terminal symbol

 

2. Greibach Normal Form

          The RHS of each rule starts with a terminal followed by zero or

          more nonterminals except possibly for “S -> l

 

THM:

Any CFL without the empty string can be generated by a CFG either in the CNF or in the GNF.

That is, any CFG whose defined CFL does not include the empty string can be converted into an equivalent CFG in either the CNF or

the GNF.

 

Converting into equivalent GNF is involved though. It requires eliminating all left-recursive rules such as:

          N -> N Alpha    or                   N -> M Beta

                                                M -> N Gamma.

 

Pumping Lemma (CFL): Necessary condition for CFL.

 

Let L be an infinite CFL. Then there is a positive integer m such that for all (sufficiently long) strings z in L with |z| >=m, z can be written in the form z =uvwxy, where the following properties hold:

                   |vx| >= 1,

                   |vwx| <= m, and

                   uvk w xky is in L for all k>=0.

 

This lemma can be used to show that a certain language in question is not a CFL instead of showing that it is as the Lemma is only a necessary condition for CFL’s and satisfying the lemma by a language does not assure that the language is indeed a CFL.

 

Example:

          Consider the CFL defined the CFG1 above.

          This language has strings such as: (), ()(), (()), ()()(), ()(()), (())(),

          (()()), ((())), and so forth.

 

          We can take ((())) for a sufficiently long string, z, and decompose

          this string into five parts such that:

                   u = “(“

                   v = “(“

                   w = “()”

                   x = “)”

                   y = “)”

          Then we see that following strings we can pump each is in this language:

                   (())              // when k=0

                   ((()))            // when k=1

                   (((())))          // when k=2

                   ((((()))))       // when k=3

                   (((((())))))     // when k=4  and so forth.

         

 

Acceptors: NPDA(Non-deterministic Pushdown Automata)

  

An NPDA is a 7-tuple: (Q,  S, S,  d, Q1, S0, F)

where Q is a finite set of states,  

            S is a finite set of Input symbols,

            S is a finite set of Stack symbols,

            d is a finite set of transitions (or moves),

            Q1 is the Initial state,

            S0 is the Start stack symbol, and

            F is a finite state of zero or more Final (or Accepting) states.

 

Acceptance conditions:

1.              The entire input string must be scanned in only one direction.

The NPDA must scan each input symbol just once while moving

the input string from left to right.

That is, each input symbol is said to be “Consumed” when it is used in deciding the next move by the NPDA. Once consumed, an input symbol is never to be used again in deciding the next move. 

2.              At the time of scanning the input string in its entirety, either

a.       the stack must be empty (Acceptance by Empty-Stack) or

b.      the NPDA must enter one of the accepting states so designated

(Acceptance by Final-States) 

3.              The NPDA stops when there is no move defined for the combination of

a.       the current state

b.      the current stack symbol and

c.       the current input symbol.

4.               An input string is rejected

a.       when the NPDA does not stop at all or

b.      when the NPDA stops in one of non-accepting states in case of Acceptance by Final-States or without the stack being completely empty in case of Acceptance by Empty-Stack, or

c.       when the NPDA stops before scanning the entire input string.

 

EX:      An NPDA accepting L = {wcwR  | w in (0+1)*} by Empty-Stack.

            R is the Start stack symbol.

            (Q1, 0, R)  ->  (Q1, XR)         (Q1, 1, R)  ->  (Q1, YR)

            (Q1, 0, X)  ->  (Q1, XX)         (Q1, 1, X)  ->  (Q1, YX)

            (Q1, 0, Y)  ->  (Q1, XY)         (Q1, 1, Y)  ->  (Q1, YY)

            (Q1, c, R)  ->   (Q2,  R)

            (Q1, c, X)  -> (Q2,  X)

            (Q1, c, Y)  -> (Q2,  Y)

            (Q2, 0, X)  -> (Q2,  l)           (Q2, 1, Y)  -> (Q2,  l)                      

            (Q2, l, R)  -> (Q2,  l)           // NPDA can make this move for any input

symbol without consuming it or when no input symbol is left.

 

            Suppose that input string is: 011c110

            The NPDA will have the following sequence of moves:

                        (Q1, 011c110, R)                    ->

                        (Q1,   11c110, XR)                 ->

                        (Q1,     1c110, YXR)              ->

                        (Q1,       c110, YYXR)           ->

                        (Q2,         110, YYXR)           ->

                        (Q2,           10, YXR)              ->

                        (Q2,             0, XR)                 ->

                        (Q2,              l, R)                   ->

                        (Q2,              l, l)                   // Terminal I.D. shows an empty stack and

                                                                        // an empty input string.

 

            Note that “011c1100” will not be accepted as the stack will become empty but

            the input is not to be entirely scanned. And, each time the stack becomes empty                 the NPDA has to stop as there is no move defined for an empty stack.

 

            Also note the above NPDA is deterministic because

(1)   there is at most one move defined for any combination of a state, an input

      (non-empty) symbols and a stack symbol and

(2)   for any move, m, defined using an empty input symbol, there is no other

move defined for the state and the stack symbol used in that move m.

      For example, an NPDA with the following two moves defined will be

      non-deterministic:

            either   (Qi,  Sj,  Pk)  -> (Ql, XXX)

                        (Qi,  Sj,  Pk)  -> (Qu, ???)

            or         (Qi,  Sj,  Pk)  -> (Qv, ???)

                        (Qi,   l,  Pk)  -> (Qw, ???)

 

            Another example:

An NPDA accepting L = {wwR  | w in (0+1)+} by Empty-Stack

 

            R is the Start stack symbol.

            (Q1, 0, R)  ->  (Q1, XR)         (Q1, 1, R)  ->  (Q1, YR)

            (Q1, 0, X)  ->  (Q1, XX)         (Q1, 1, X)  ->  (Q1, YX)

            (Q1, 0, Y)  ->  (Q1, XY)         (Q1, 1, Y)  ->  (Q1, YY)

            (Q1, 0, X)  -> (Q2,  l)           (Q1, 1, Y)  -> (Q2,  l) 

// For above two moves, current input symbol is assumed to be beginning

// of the second half.

            (Q2, 0, X)  -> (Q2,  l)           (Q2, 1, Y)  -> (Q2,  l)                      

            (Q2, l, R)  -> (Q2,  l)          

 

            Note that above NPDA is really non-deterministic.

            Let us consider possible sequences of moves when input string is:

                        011110

 

            a.  A successful sequence

                        (Q1, 011110, R)          ->

                        (Q1, 11110, XR)         ->

                        (Q1, 1110, YXR)        ->

                        (Q1, 110, YYXR)       ->

                        (Q2, 10, YXR)                        ->

                        (Q2, 0, XR)                 ->

                        (Q2, l, R)                    ->

                        (Q2, l, l)

 

            b. An unsuccessful sequence

                        (Q1, 011110, R)          ->

                        (Q1, 11110, XR)         ->

                        (Q1, 1110, YXR)        ->

                        (Q1, 110, YYXR)       ->

                        (Q1, 10, YYYXR)      ->

                        (Q2, 0, YYXR)           No next move defined, so stopping in failure.

 

c. Another unsuccessful sequence

(Q1, 011110, R)          ->

                        (Q1, 11110, XR)         ->

                        (Q1, 1110, YXR)        ->

(Q2, 110, XR)             No next move defined, so stopping in failure.

 

 

Equivalence of two Acceptance Conditions:

            Acceptance by Empty-Stack and

            Acceptance by Final-States.

 

THM:

If L is L(M2) for some NPDA M2, then L is N(M1) for some NPDA M1.

   Note that L(M) denotes the language accepted by M by Final-States and

   N(M) denotes the language accepted by M by Empty-Stack.

 

Proof:

We need to construct M1 from M2 so that it will accept the same language

accepted by M2 but by Empty-stack.

Some key ideas for doing that:

1.              M1 will erase all remaining stack symbols as soon as M2 enters a final state

after doing the same things as M2 does.

2.              We need to prevent M1 from accidentally emptying the stack just because M2 does without entering a final state. For this, M1 will use a special bottom-of-stack symbol that M2 does not have.

This stack symbol will become its own Start stack symbol, Z0 .

3.              M1 will need two extra states:

Q’  : Its own Initial state

Qe  : A state in which M1 will just empty the stack.

 

Now, we are ready to assemble all needed transitions for M1.

          Assume that Q1 is the initial state of M2 and  S0  is its Start stack symbol.

1.         (Q’,  l, Z0)      -> (Q1, S0 Z0)

2.                  All existing transitions of M2.

3.                  (q,  l, Z)          -> (Qe  , l)       for each accepting state q of M2 and each

stack symbol Z of M2 including Z0       

4.                  (Qe , l, Z)        -> (Qe  , l)       for all stack symbols of M2 including Z0  

 

From the construction, M1 should accept an input string by Empty-stack if and only if M2 accepts it by Final-States.

 

THM:

If L is N(M1) for some NPDA M1, then L is L(M2) for some NPDA M2.

 

Proof: Amounts to showing how we can construct M2 from a given M1.

Some key ideas for this construction:

1.                  M2 will need its own bottom-of-stack symbol that M1 does not have so that while simulating M1 encountering this bottom-of-stack symbol will initiating the process of entering the final state of M2 that M1 does not have.

2.                  M2 will use 2 extra states and one extra stack symbol:

Qf  :  The only final state

Q0  :  Its own initial state

Z0  :   Its own bottom-of-stack symbol

 

The moves of M2 include:

       Assume that Q1  and  S0   are the initial state and the Start stack symbol of

       M1, respectively.

       1.              (Q0 ,  l,  Z0)     ->         (Q1 , S0 Z0 )

       2.              All existing transitions of M1

       3.              (q,     l,  Z0)     ->         (Qf  , l)   for every state q of M1

 

From the construction, M2 will accept an input string by entering its only final state Qf    if and only if M1 accepts by Empty-stack.

QED.

 

THM: (Equivalence of NPDA’s and CFL’s)

If L is a CFL, then there exists an NPDA M such that L = N(M)

Proof:

Use GNF CFG assuming that the CFL under consideration does not have the empty-string.

 

 

DCFL

            CFL’s that are accepted by a deterministic PDA, respectively.

            These are models for programming languages in actual use.

 

            Closure properties.

 

            Closed under

1.      Complementation

2.      MAX

3.      MIN

4.      inverse Homomorphism

5.      intersection with a regular set.

 

Not closed under

1.      Union

2.      Intersection

3.      Concatenation

4.      Reflexive closure

 

For Union, consider:

L1 = {0 j  1j  2k  |  j,k>=0}

L2 = {0 j  1k  2k  |  j,k>=0}

Each is a DCFL.

But, their union is a CFL but not DCFL.

L = Complement(L1+L2)

L3 = L ^ 0* 1* 2*

                 = {0 i  1j  2k  | i<>j and j<>k}, which is not CFL

            Hence, L is not CFL. And, so L is not a DCFL.

            So, their union is not a DCFL.