CFL
(Context-Free Languages)
Context-Free Grammars:
A production of the form: N -> w
where N must be exactly one non-terminal symbol (and nothing
else) and
w
consists of zero or more grammar symbols
(possibly
zero symbol)
Example: CFG1
‘S’ is the Start Symbol.
S -> S R
S -> R
R -> ( )
R -> ( S )
We call a production-rule “Empty” if its RHS is an
empty-string.
Above CFG has no such production rule, namely, there is no
empty
production rule.
Consider the following CFG:
S -> S R
S ->
R P
S -> R
S ->
R
R -> ( )
P ->
S
R -> ( P )
P -> l
P -> S
R -> ( )
P ->
l
R -> ( S )
Each of above two CF grammars generates the same language as
the
above grammar, CFG1, does but they each has an
empty production rule.
Some Useful Features of
CFG.
Ex: E -> E + T |
T
T ->
T * F | F
F
-> id | ( E )
Above grammar generates numerical expressions such as:
Id
Id + id
Id + id * id
(id + id) * id
(((id)))
((id+id)*id)*(id+id) and etc.
The grammar specifies, among others:
1.
An expression can have any
number of + and *
2.
* has higher precedence over
+
3.
Parentheses have higher
precedence over * and +
4.
* and + each is a left-associated binary
operator.
In addition, CFG can
specify
a.
A certain language feature
must appear before some other features like declaration section appearing before
execution section.
b.
A certain specification can
repeat any number of times.
Elimination of Empty
Production rules from a CFG.
Unless the Start Symbol can
derive the empty-string, each empty production can be eliminated from a CFG
without altering the language defined by the grammar. And, if the Start Symbol
can derive the empty string then the only empty production we are unable to
eliminate from a CFG must be the one for that Start Symbol which we can consider
to be a special case of the empty rule.
Example: CFG2
S -> S P
S -> Q P Q
P -> ( )
P -> ( S )
Q -> R
R -> S R
R - > l
In the above grammar, we say that ‘R’ and ‘Q’ are each “Nullable”
as each of them can derive the empty string as the former does it directly by
itself while the latter does it indirectly by way of ‘R’.
But, other non-terminals are not nullable.
After eliminating the only one empty rule of the grammar, we
may
obtain the following equivalent
grammar:
S -> S P
S -> Q P Q
S -> Q P
S -> P Q
S -> P
P -> ( )
P -> ( S )
Q -> R
R -> S R
R -> S
Note that ‘S’ is not nullable in the above
CFG2, so the resulting CFG
has no empty rule at all, not even ‘S -> l‘
Also, note that we can systematically eliminate all “Unit”
production
rules of the form: N -> RHS where RHS string consists only of
one
non-terminal and no terminal such as “R -> S”
and “Q -> R”
Definition: CF
Languages.
A language generated by any
CFG is Context-Free.
Two normal
forms:
1. Chomsky Normal
Form
Each rule is of the form except possibly for “S -> l “:
N -> P Q
// RHS has exactly two non-terminals
N -> t
// RHS has just one terminal symbol
2. Greibach Normal Form
The RHS of each rule starts with a terminal followed by zero
or
more nonterminals
except possibly for “S -> l “
THM:
Any CFL without the empty
string can be generated by a CFG either in the CNF or in the
GNF.
That is, any CFG whose
defined CFL does not include the empty string can be converted into an
equivalent CFG in either the CNF or
the GNF.
Converting into equivalent
GNF is involved though. It requires eliminating all left-recursive rules such
as:
N -> N Alpha
or
N -> M Beta
M -> N Gamma.
Pumping Lemma (CFL):
Necessary condition for CFL.
Let L be an infinite CFL.
Then there is a positive integer m such that for all (sufficiently long) strings
z in L with |z| >=m, z can be written in the form z =uvwxy, where the following properties
hold:
|vx| >= 1,
|vwx| <= m, and
uvk
w xky is in L for all
k>=0.
This lemma can be used to
show that a certain language in question is not a CFL instead of showing that it
is as the Lemma is only a necessary condition for CFL’s and satisfying the lemma
by a language does not assure that the language is indeed a
CFL.
Example:
Consider the CFL defined the CFG1 above.
This language has strings such as: (), ()(),
(()), ()()(), ()(()), (())(),
(()()), ((())), and so
forth.
We can take ((())) for a sufficiently long string, z, and decompose
this string into five parts such
that:
u = “(“
v = “(“
w = “()”
x = “)”
y = “)”
Then we see that following strings we can pump each is in this
language:
(())
// when k=0
((()))
// when k=1
(((())))
// when k=2
((((())))) // when
k=3
(((((())))))
// when k=4
and so forth.
Acceptors: NPDA(Non-deterministic Pushdown
Automata)
An NPDA is a 7-tuple: (Q, S, S, d, Q1, S0, F)
where Q is a finite set of states,
S is a finite set of Input symbols,
S is a finite set of Stack symbols,
d is a finite set of transitions (or moves),
Q1 is the Initial state,
S0 is the Start stack symbol, and
F is a finite state of zero or more Final (or Accepting) states.
Acceptance conditions:
1. The entire input string must be scanned in only one direction.
The NPDA must scan each input symbol just once while moving
the input string from left to right.
That is, each input symbol is said to be “Consumed” when it is used in deciding the next move by the NPDA. Once consumed, an input symbol is never to be used again in deciding the next move.
2. At the time of scanning the input string in its entirety, either
a. the stack must be empty (Acceptance by Empty-Stack) or
b. the NPDA must enter one of the accepting states so designated
(Acceptance by Final-States)
3. The NPDA stops when there is no move defined for the combination of
a. the current state
b. the current stack symbol and
c. the current input symbol.
4. An input string is rejected
a. when the NPDA does not stop at all or
b. when the NPDA stops in one of non-accepting states in case of Acceptance by Final-States or without the stack being completely empty in case of Acceptance by Empty-Stack, or
c. when the NPDA stops before scanning the entire input string.
EX: An NPDA accepting L = {wcwR | w in (0+1)*} by Empty-Stack.
R is the Start stack symbol.
(Q1, 0, R) -> (Q1, XR) (Q1, 1, R) -> (Q1, YR)
(Q1, 0, X) -> (Q1, XX) (Q1, 1, X) -> (Q1, YX)
(Q1, 0, Y) -> (Q1, XY) (Q1, 1, Y) -> (Q1, YY)
(Q1, c, R) -> (Q2, R)
(Q1, c, X) -> (Q2, X)
(Q1, c, Y) -> (Q2, Y)
(Q2, 0, X) -> (Q2, l) (Q2, 1, Y) -> (Q2, l)
(Q2, l, R) -> (Q2, l) // NPDA can make this move for any input
symbol without consuming it or when no input symbol is left.
Suppose that input string is: 011c110
The NPDA will have the following sequence of moves:
(Q1, 011c110, R) ->
(Q1, 11c110, XR) ->
(Q1, 1c110, YXR) ->
(Q1, c110, YYXR) ->
(Q2, 110, YYXR) ->
(Q2, 10, YXR) ->
(Q2, 0, XR) ->
(Q2, l, R) ->
(Q2, l, l) // Terminal I.D. shows an empty stack and
// an empty input string.
Note that “011c1100” will not be accepted as the stack will become empty but
the input is not to be entirely scanned. And, each time the stack becomes empty the NPDA has to stop as there is no move defined for an empty stack.
Also note the above NPDA is deterministic because
(1) there is at most one move defined for any combination of a state, an input
(non-empty) symbols and a stack symbol and
(2) for any move, m, defined using an empty input symbol, there is no other
move defined for the state and the stack symbol used in that move m.
For example, an NPDA with the following two moves defined will be
non-deterministic:
either (Qi, Sj, Pk) -> (Ql, XXX)
(Qi, Sj, Pk) -> (Qu, ???)
or (Qi, Sj, Pk) -> (Qv, ???)
(Qi, l, Pk) -> (Qw, ???)
Another example:
An NPDA accepting L = {wwR | w in (0+1)+} by Empty-Stack
R is the Start stack symbol.
(Q1, 0, R) -> (Q1, XR) (Q1, 1, R) -> (Q1, YR)
(Q1, 0, X) -> (Q1, XX) (Q1, 1, X) -> (Q1, YX)
(Q1, 0, Y) -> (Q1, XY) (Q1, 1, Y) -> (Q1, YY)
(Q1, 0, X) -> (Q2, l) (Q1, 1, Y) -> (Q2, l)
// For above two moves, current input symbol is assumed to be beginning
// of the second half.
(Q2, 0, X) -> (Q2, l) (Q2, 1, Y) -> (Q2, l)
(Q2, l, R) -> (Q2, l)
Note that above NPDA is really non-deterministic.
Let us consider possible sequences of moves when input string is:
011110
a. A successful sequence
(Q1, 011110, R) ->
(Q1, 11110, XR) ->
(Q1, 1110, YXR) ->
(Q1, 110, YYXR) ->
(Q2, 10, YXR) ->
(Q2, 0, XR) ->
(Q2, l, R) ->
(Q2, l, l)
b. An unsuccessful sequence
(Q1, 011110, R) ->
(Q1, 11110, XR) ->
(Q1, 1110, YXR) ->
(Q1, 110, YYXR) ->
(Q1, 10, YYYXR) ->
(Q2, 0, YYXR) No next move defined, so stopping in failure.
c. Another unsuccessful sequence
(Q1, 011110, R) ->
(Q1, 11110, XR) ->
(Q1, 1110, YXR) ->
(Q2, 110, XR) No next move defined, so stopping in failure.
Equivalence of two Acceptance Conditions:
Acceptance by Empty-Stack and
Acceptance by Final-States.
THM:
If L is L(M2) for some NPDA M2, then L is N(M1) for some NPDA M1.
Note that L(M) denotes the language accepted by M by Final-States and
N(M) denotes the language accepted by M by Empty-Stack.
Proof:
We need to construct M1 from M2 so that it will accept the same language
accepted by M2 but by Empty-stack.
Some key ideas for doing that:
1. M1 will erase all remaining stack symbols as soon as M2 enters a final state
after doing the same things as M2 does.
2. We need to prevent M1 from accidentally emptying the stack just because M2 does without entering a final state. For this, M1 will use a special bottom-of-stack symbol that M2 does not have.
This stack symbol will become its own Start stack symbol, Z0 .
3. M1 will need two extra states:
Q’ : Its own Initial state
Qe : A state in which M1 will just empty the stack.
Now, we are ready to assemble all needed transitions for M1.
Assume that Q1 is the initial state of M2 and S0 is its Start stack symbol.
1. (Q’, l, Z0) -> (Q1, S0 Z0)
2. All existing transitions of M2.
3. (q, l, Z) -> (Qe , l) for each accepting state q of M2 and each
stack symbol Z of M2 including Z0
4. (Qe , l, Z) -> (Qe , l) for all stack symbols of M2 including Z0
From the construction, M1 should accept an input string by Empty-stack if and only if M2 accepts it by Final-States.
THM:
If L is N(M1) for some NPDA M1, then L is L(M2) for some NPDA M2.
Proof: Amounts to showing how we can construct M2 from a given M1.
Some key ideas for this construction:
1. M2 will need its own bottom-of-stack symbol that M1 does not have so that while simulating M1 encountering this bottom-of-stack symbol will initiating the process of entering the final state of M2 that M1 does not have.
2. M2 will use 2 extra states and one extra stack symbol:
Qf : The only final state
Q0 : Its own initial state
Z0 : Its own bottom-of-stack symbol
The moves of M2 include:
Assume that Q1 and S0 are the initial state and the Start stack symbol of
M1, respectively.
1. (Q0 , l, Z0) -> (Q1 , S0 Z0 )
2. All existing transitions of M1
3. (q, l, Z0) -> (Qf , l) for every state q of M1
From the construction, M2 will accept an input string by entering its only final state Qf if and only if M1 accepts by Empty-stack.
QED.
THM: (Equivalence of NPDA’s and CFL’s)
If L is a CFL, then there exists an NPDA M such that L = N(M)
Proof:
Use GNF CFG assuming that the CFL under consideration does not have the empty-string.
CFL’s that are accepted by a deterministic PDA, respectively.
These are models for programming languages in actual use.
Closure properties.
Closed under
1. Complementation
2. MAX
3. MIN
4. inverse Homomorphism
5. intersection with a regular set.
Not closed under
1. Union
2. Intersection
3. Concatenation
4. Reflexive closure
For Union, consider:
L1 = {0 j 1j 2k | j,k>=0}
L2 = {0 j 1k 2k | j,k>=0}
Each is a DCFL.
But, their union is a CFL but not DCFL.
L = Complement(L1+L2)
L3 = L ^ 0* 1* 2*
= {0 i 1j 2k | i<>j and j<>k}, which is not CFL
Hence, L is not CFL. And, so L is not a DCFL.
So, their union is not a DCFL.