Semi-Thue
Processes
Less
restricted version of Grammars
Definition: Semi-Thue Production (or simply Production)
Given an alphabet A, the expression, u -> v, is a semi-thue production
Where u and v are a string each on A.
A Semi-Thue process is a finite
set of Semi-Thue productions.
EX: A = {a,b,c}
Semi-Thue Process =
P =
{ab->ba,
bc->cb, ca->ac}
In this case, we can have the following:
abbc
-> babc -> bacb
abbc
-> abcb -> bacb
abbc
-> abcb -> acbb
So, we say that there is a derivation from “abbc” to “babc”
, or
from “abbc”
to “bacb” or from “abbc” to
“acbb” and so on.
However, there is no derivation from “abbc”
to “abb” or to “bbca” or
to “accbb”
and so on.
Of course, these derivations are all for the given Semi-Thue Process.
To specify a particular S.T. Process under which a
derivation is obtained,
we use the following notation:
1. when exactly one step is used:
abbc -> babc
P
2. when zero or more steps are used:
*
abbc -> abbc // zero step
P
*
abbc -> babc //
one step
P
*
abbc -> bbac //
two steps
P
Our immediate goal:
To show the equivalence
between DTM’s and NDTM’s.
All we need to show is that any given NDTM
can be converted to an equivalent
DTM since obtaining an equivalent NDTM
(equivalent to a given DTM) is trivial.
Our strategy:
To find an equivalent S.T.P so that
1. from that S.T.P,
we will get an equivalent grammar and
2. show that each
grammar defines an RE language which a DTM must accept
since RE
languages and DTM’s are equivalent.
An Equivalent S.T.P to a NDTM
1. Let A be the alphabet of a NDTM, M, and A
= {S1,S2, ..., SK}
and let its set of states be
{Q1,Q2, ..., Qn}
2. The
wanted S.T.P called, S(M), will have the
following alphabet:
S0,S1,S2, ..., SK, Q0, Q1, ..., Qn, Qn+1 , h
Our goal right now in defining this S.T.P is
finding all its productions that will
derive a certain
fixed string from any input string if and only if this input string
is accepted by this
TM.
So, in that case, this TM and the resulting
S.T.P are equivalent in the sense that
the S.T.P can
derive a certain goal string from any input string if and only if this TM
accepts the input
string (by stopping, of course).
We need to obtain S.T. productions for this
wanted S(M) such that
h Q1 B X h
-> h Q0 h
S(M)
every time and only when this TM has a computation starting
from the intial ID
of: ...BBB
Q1 B X BBB...
The productions will be obtained from each
quadruple of this TM, M, as follows:
a.
Qi Sj Sk Ql
For each such quadruple, we need just one production: Qi Sj -> Ql Sk
b. Qi Sj R Ql
For each such quadruple, we need many productions to
account for many different
tape symbols that happen to be
next (Right this case) to the current tape symbol
including
the case of the endmarker, h.
So, Qi Sj Sm ->
Sj Ql Sm for each m (from
0 to K)
and Qi Sj h ->
Sj Ql B h //
extending the Right endmarker.
c. Qi Sj L Ql
For each such quadruple, likewise, we need many productions
for the same
reasons above (b)
Sm Qi Sj ->
Ql Sm Sj for each m (from
0 to K)
and h Qi Sj ->
h Ql B Sj // extending the
Left endmarker.
Additional productions
Needed to erase all tape symbols except those we want at
time of this TM
halting.
a. For
each halting combination of Qi Sj of this TM, where i=1,2,…,
n and
j=0,1,2, …, K, we add
Qi
Sj
-> Qn+1 Sj // Qn+1 is a new state and a brand new
// symbol of S.T.P
being constructed.
b. Qn+1 Sj -> Qn+1 // for all j=0,1,2, …, K
Qn+1 h -> Q0 h //
Right endmarker
Sj Q0 -> Q0 // for all j=0,1,2, …, K
EX:
Suppose that a TM starts with the initial I.D. of:
h Q1 B
X h where X is an input string.
And suppose that this TM stops for the halting combination
of Qi Sj
In
this case, for the resulting S.T.P, we will have the following derivation:
+
h
u Qi Sj v
h -> h
u Qn+1 Sj v
h ->
h u
Qn+1 h
->
*
h u
Q0 h ->
h Q0 h
THM2.2 (Page-175)
(Correspondence between an accepted string and a derivation sequence
by the resulting S.T.P leading to our goal string h Q0 h, a sufficient and necessary
condition for an accepted string)
An NDTM accepts an input
string U (on its tape alphabet, of course) if and only
if there is a derivation sequence by the resulting S(M) from “h Q1 B U h”
to “h
Q0 h”
Proof:
1. The necessary condition
Above example shows the necessary condition
for an accepted string.
That is, if a
string U is accepted by the TM, there is going to be a derivation
sequence from “h Q1 B U h”
to “h
Q0 h”
2. The sufficient condition
On the other
hand, if a string is not accepted, then starting from the initial Post word
“h Q1 B U h” any Post word derivable will contain
exactly one Qi (for I=1,2, …, n)
and does not contain the state Qn+1 nor Q0
because no Post word derivable
will
contain a halting combination of the (current) state and the
(current) tape symbol.
Hence, the Post
word “h Q0 h” will not be derived.
THM2.3, page-176 (THM2.2 in
Inverse Process, called W(M))
A NDTM M accepts a string U if
and only if there is a derivation sequence from
the Post word “h Q0 h” to
a Post word “h Q1 B U
h”.
Proof: This is an immediate
consequence of the THM2.2 above as for any given S.T.P, P,
there is a derivation from a string U to a string V by P if and only
if there is a derivation from the string V back to the
string U by its Inverse Process.
THM3.1, page-176 (Existence
of a TM with unsolvable Word Problem for both
its S(M) and its W(M))
Proof: Note that the Word Problem for S(M) for any
given TM M is solvable
if and only if it is solvable for its W(M).
Let us pick any TM that
accepts a non-recursive RE set, say S.
This RE set S has an undecidable membership problem.
Now, a string U is in the set S <->
*
h Q1 B U h -> h Q0 h
S(M)
THM2.1, page-174 (Unique
derivation sequence for the
S(M) for any DTM, M)
For any DTM M, we have at
most one possible derivation sequence of Post words
from the initial Post word, “h Q1 B U h” for a given
arbitrary input string U
to any given Post word.
THM3.4, page-178 (Existence
of a S.T.P on a two-symbol alphabet
with an Unsolvable Word Problem)
There is a S.T.P on the
alphabet {a,b} whose Word
problem is unsolvable.
Moreover, for each
production, g -> h, both g and h are non-empty.
Key to
Proof.
1.
Consider a TM M whose S(M) has
an unsolvable Word problem.
2.
The RHS and LHS of each production of
this S.T.P are each non-empty.
3.
Replace every symbol of this S.T.P with a
string of the form: abbb…bbba such that
the number
of b’s uniquely identifies the symbol of the S.T.P.
4.
Construct a new S.T.P from S(M) by replacing each current production by one we obtain
after
replacing each current symbol by the string obtained in the step (3) above.
5.
The resulting new S.T.P will have an
unsolvable Word problem just as the current S.T.P
Has one.
QED
EX: S1 S2
-> S2 S1 //
abaabba -> abbaaba
S1 S4 S5 -> S5 S4 S1 //
abaabbbbaabbbbba -> abbbbbaabbbbaaba
S2 S3
-> S1 S2 S3 // abbaabbba
-> abaabbaabbba
Definition
: Thue Process
A S.T.P is called a Thue process if it contains the inverse of every production
in it.
EX: Thue Process
aba
-> abba abba -> aba
bbaa
-> abb abb -> bbaa
bab
-> baab baab -> bab
aa
-> aa
Above process is a T.P. as well as a S.T.P
Definition: For an given TM, M, we
define a T.P
q(M) = S(M) + W (M)
THM3.2, page-177 (Post’s
Lemma)
(Equivalence between S (M) and q (M) as far as the
existence of a derivation starting
from an arbitrary initial Post word is concerned for any
deterministic M)
Let M be a DTM and let U be a
word (string) on its tape alphabet.
*
Then, h
Q1 B U
h -> h Q0
h if and only if
S(M)
*
h Q1
B U h
-> h Q0 h
q(M)
Note that
above equivalence holds if M is deterministic.
That is, it may not hold when
M is not deterministic.
When M is deterministic, the THM
implies that as far as such derivations are concerned,
q(M) is no more powerful than S(M)
THM4.1, page-182 (Unsolvability of Post Correspondence Problem)
Given any arbitrary Post
correspondence system, it is unsolvable to determine
whether or not it has a solution.
That is, there is no
algorithm for determining whether the given system has a solution.
EX: Post correspondence system
Assume that A(alphabet) = {a,b,c}
Set-A Set-B
------- -------
aba ab
abbb bbaa
baba abab
b ab
Above system has a solution, “abab”
as it can be constructed by the same
sequence of substrings from both the sets of the system as
follows:
From Set-A,
abab = aba + b
From Set-B, abab =
ab
+ ab
Note that if we remove the last pair of
substrings, the resulting system will not
have a solution.
This
is because we note that no pair has the same symbol
as the
last symbol of the two substrings of the pair.
While above system has been found
to have a solution, the THM implies that, in general,
we cannot decide whether an arbitrary P.C. system has a
solution.
A grammar has four
components:
1.
The set, T, of terminal symbols
2.
The set, N, of non-terminal symbols
3.
The set, P, of production rules
4.
A unique non-terminal symbol designated
as the Start symbol.
EX:
A grammar
T = {a,b,c}
N = {S,B,C}
P = {
S -> a
S B C
S -> a
b C
C
B -> B C
b B -> b b
b C -> b c
c C ->
c c}
S = The Start symbol.
Strings (Sentences) consisting only of
terminals that can be derived by this grammar
constitute the
language defined by this grammar, L(G).
+
Formally, L(G) = { u | S -> u & u is in T*
}
L(G) = {am
bm cm | m
>=1} = {abc, aabbcc, aaabbbccc, aaaabbbbcccc, … }
abc: S -> a b C -> a b c
aabbcc: S -> a S B C
-> a a b C B C
-> a a b B C C
-> a a b b C C
-> a a b b c C
-> a a b b c c
Strings not in this language:
a a b c b c
(No derivation)
a a b c B C (Derivation)
a a b b c C (Derivation)
Types of Grammar
We classify grammars
according to the restrictions imposed on production rules.
Unrestricted (Phrase Structured Grammars): Type-0
Context-Sensitive Grammars (CSG):Type-1
Context-Free Grammars (CFG):Type-2
Regular Grammars:Type-3
1.
No restriction except that the LHS of
each rule must have at least one non-terminal.
2.
The length of RHS is at least as large as
that of LHS.
3.
The LHS has just one non-terminal and no
terminal at all.
The
RHS can be empty.
4.
In addition to being CFG, they have
additional restrictions that
RHS = just one terminal and no non-terminal
or
one
terminal followed by one non-terminal.
Above grammar is not Regular and not a CFG but a CSG which
is also Unrestricted.
(Existence of a grammar
equivalent to an arbitrary NTM, M)
Given an arbitrary NTM, M, we
have a grammar G such that
L(M) = L(G), that is, the language accepted by this M is
the same as the language
generated by this grammar.
Proof:
This amounts to identifying
the grammar.
The production rules of this grammar
include:
1.
All productions of W(M).
2. In addition, the following
rules:
a. S -> h Q0 h
b. h Q1
B -> Q
c. Q Si -> Si Q , for all i=1,2,
..., K
d. Q h -> l.
The terminal set, T, of this grammar is: T
= {S1,S2, ..., SK}
The non-terminal set, N, of this grammar
includes all other symbols including:
S, Q1, Q2, ..., Qn, Q0, Qn+1
, Q, h
And, S is the Start symbol.
It is the case that M accepts
an input string U if and only if
there is a derivation from “h Q1 B U h” to “h Q0 h” by S(M) which is the
case
if and only if there is a derivation from “h Q0 h” to “h
Q1 B U h” by W(M)
which is the case if and only if there is a derivation:
+ *
S -> h Q0 h -> h Q1 B U h -> Q U h -> U Q h -> U by the grammar. PM!!!
THM7.5.2, page-188
(Equivalence between Grammars
and RE sets)
Proof:
1. An RE set -> DTM -> NDTM
-> A grammar that generates the RE set.
2. A grammar G -> the language
generated by the grammar is an RE set.
+
L(G) = {U | S -> U & U in T*}
G
+
= {U | S -> U} ^ T*,
which is the intersection of an RE set and a
G Recursive set which is RE.
+
Note that (S -> U) is
the same as (there exists y such that
G
DERIV(U,y))
which is the same as the question of whether or not the following
function
is
defined:
min DERIV(U,y)
y
which is an
unbounded minimalization on a PR predicate
which is a PC function g(U)
as
DERIV(x,y) is a PR predicate.
That
is, whether or not U is derivable from S is the same as whether or not
a certain
unary PC function G(U) is defined.
Note also that if
the grammar is CS above unbounded minimalization
becomes
a
bounded quantification on a PR predicate instead as the maximum length of
possible derivation
sequences is bounded and values of possible strings in any
derivation sequence
also is bounded , which results in a PR Predicate.
So in this case, the question of whether or
not an arbitrary string is derivable from
the start symbol S
now is computable, making the set of all derivable strings
Recursive.
So, the language defined by any CS grammar is
Recursive: THM5.4, page 190.
THM: (Unsolvability
of CFG (Context-Free Grammar) Ambiguity problem)
Given an arbitrary CFG, it is unsolvable to
decide whether or not it is ambiguous.
Proof:
We
can reduce an arbitrary PCP system to an equivalent CFG such that PCP is
solvable
if and only
if the CFG is ambiguous.
Let an arbitrary PCP system have two sets of
corresponding substring pairs,
w1 , w2 , w3
, …, wn
.
x1 , x2 , x3 , …, xn
Now, consider the following n symbols not in the
alphabet:
a1 , a2 , a3 , …, an
We construct a CFG from above pair of lists
of substrings as follows:
S -> SA | SB
SA -> wi SA
ai , for i =1,2, …, n
SA -> wi ai , for i =1,2, …, n
SB -> xi SB ai , for i =1,2, …, n
SB -> xi ai , for i =1,2, …, n
For this above, G,
L(G) = { xi1 xi2 … xik aik … ai2 a i1
,
wj1 wj2
…wmj amj
… aj2 a j1 }
The
first kind of strings are to be derived from the non-terminal SB and
the second of strings from the non-terminal SA .
A string of the first kind and a string of the second kind
can be the same when the two
non-terminals SA and SB do derive
the same string, which makes the grammar
ambiguous. And the grammar is
ambiguous exactly when the given PCP system has
a solution.
EX:
PCP System:
ab abb
aa baa
bb b
bbaa is a solution
Corresponding CFG:
S -> SA
S -> SB //
These two production rules can be written:
S -> SA |
SB
SA
-> ab SA c1 | aa SA c2 | bb SA c3 | ab c1
| aa c2 | bb c3
SB -> abb SB c1 | baa SB c2 |
b SB c3 | abb c1 | baa c2 | b c3
A string made up of this solution in part can be derived in
two different derivation
sequences making the grammar ambiguous as follows:
S -> SA -> bb SA c3 -> bb aa c2 c3
S -> SB -> b SB c3
-> b baa c2 c3
Some Unsolvable Problems on Grammars
1. Consider a TM M that
accepts an RE set that is not Recursive and an arbitrary input U.
2. Consider a grammar GU :
The nonterminals: The entire
alphabet of S(M), S, V
The
terminals: a (only)
Production
rules:
All of S(M)
and
S -> h Q1 B U h
h Q0 h ->
V
V -> a V
V -> a
Note
that this grammar depends not only on a TM but also on a possible input
String.
3. M accepts U <-> L(GU ) = { am |
m>=1}
<-> L(GU ) <> The
Empty-set
<-> L(GU ) is infinite
<->
“a” is in L(GU )
Above construction of GU leads to:
THM 6.1, page-192
Given a grammar G, there is
no algorithm for deciding whether or not
1. L(G) is empty
(Grammar Emptiness Problem)
2. L(G) is infinite (Grammar
Infinity Problem)
3. an arbitrary string is in L(G)
(Membership Problem)
THM 6.2, page-192 (Grammar
Equivalence Problem and Grammar Subset Problem)
Given two grammars, G1 and
G2, there is no algorithm for deciding whether
1. L(G1)
= L(G2)
2. L(G1)
<= L(G2)
Proof:
1. Let
L(G1)= { am | m>=1}
That is: G1
has following two production rules:
S ->
a S
S ->
a
2. Let
G2 be above grammar constructed from an arbitrary
TM and an arbitrary input
string.
3. M accepts an input string U <-> L(G2) = L(G1)
<-> L(G2)
<= L(G1)
Note that Grammar Proper
Subset Problem is also unsolvable:
L(G2) <
L(G1)
when
G1 has following production rules:
S -> a S
S -> l (empty-string)