Semi-Thue Processes

 

          Less restricted version of Grammars

          Definition: Semi-Thue Production (or simply Production)

          Given an alphabet A, the expression, u -> v, is a semi-thue production

          Where u and v are a string each on A.

 

          A Semi-Thue process is a finite set of Semi-Thue productions.

 

          EX:   A = {a,b,c}

                   Semi-Thue Process =

P =  

{ab->ba, bc->cb, ca->ac}

 

          In this case, we can have the following:

 

          abbc -> babc -> bacb

          abbc -> abcb -> bacb

          abbc -> abcb -> acbb

 

          So, we say that there is a derivation from “abbc” to “babc” , or

          fromabbc” to “bacb” or from “abbc” to “acbb” and so on.

          However, there is no derivation from “abbc” to “abb” or to “bbca” or

          toaccbb” and so on.

          Of course, these derivations are all for the given Semi-Thue Process.

          To specify a particular S.T. Process under which a derivation is obtained,

          we use the following notation:

 

          1.       when exactly one step is used:

 

                   abbc -> babc

                            P                                

 

          2.       when zero or more steps are used:

                           

                            *       

                   abbc -> abbc                 // zero step

                            P                

 

         *       

                   abbc -> babc                 // one step

                            P      

 

         *       

                   abbc -> bbac                 // two steps

                            P      

 

Our immediate goal:

   To show the equivalence between DTM’s and NDTM’s.

   All we need to show is that any given NDTM can be converted to an equivalent

   DTM since obtaining an equivalent NDTM (equivalent to a given DTM) is trivial.

 

Our strategy:

   To find an equivalent S.T.P so that

   1. from that S.T.P, we will get an equivalent grammar and

   2. show that each grammar defines an RE language which a DTM must accept

       since RE languages and DTM’s are equivalent.

 

An Equivalent S.T.P to a NDTM

 

   1.    Let  A be the alphabet of a NDTM, M,  and  A = {S1,S2, ..., SK}

          and let its set of states be {Q1,Q2, ..., Qn}

 

   2.    The wanted S.T.P called, S(M),  will have the following alphabet:

          S0,S1,S2, ..., SK,  Q0, Q1, ..., Qn, Qn+1  , h

 

   Our goal right now in defining this S.T.P is finding all its productions that will

   derive a certain fixed string from any input string if and only if this input string

   is accepted by this TM.

   So, in that case, this TM and the resulting S.T.P are equivalent in the sense that

   the S.T.P can derive a certain goal string from any input string if and only if this TM

   accepts the input string (by stopping, of course).

 

   We need to obtain S.T. productions for this wanted S(M) such that

 

          h Q1 B X h   ->   h Q0 h  

                                S(M)

          every time and only when this TM has a computation starting from the intial ID

          of:   ...BBB Q1 B X BBB...

 

    The productions will be obtained from each quadruple of this TM, M, as follows:

 

     a.           Qi      Sj      Sk     Ql     

 

          For each such quadruple, we need just one production:  Qi  Sj  ->  Ql  Sk

 

     b.  Qi      Sj      R       Ql

 

          For each such quadruple, we need many productions to account for many different

          tape symbols that happen to be next (Right this case) to the current tape symbol

including the case of the endmarker, h.

          So,    Qi  Sj  Sm    ->  Sj  Ql  Sm  for each m (from 0 to K)

          and             Qi  Sj  h        ->  Sj  Ql  B  h    // extending the Right endmarker.

 

     c.  Qi      Sj      L       Ql

 

          For each such quadruple, likewise, we need many productions for the same

          reasons above (b)

                   Sm  Qi  Sj    ->  Ql  Sm  Sj  for each m (from 0 to K)

          and    h   Qi  Sj      ->   h  Ql  B  Sj    // extending the Left endmarker.   

 

     Additional productions                                  

          Needed to erase all tape symbols except those we want at time of this TM

          halting.

 

     a.  For each halting combination of Qi  Sj    of this TM, where i=1,2,…, n  and

          j=0,1,2, …, K, we add

                     Qi  Sj    ->  Qn+1 Sj               // Qn+1  is a new state and a brand new

                                                // symbol of S.T.P being constructed.

 

     b.  Qn+1    Sj    ->  Qn+1                    // for all j=0,1,2, …, K

          Qn+1    h     ->  Q0   h                  // Right endmarker

          Sj     Q0       ->  Q0            // for all j=0,1,2, …, K  

 

     EX:

          Suppose that a TM starts with the initial I.D. of:         

                   h Q1  B  X  h      where X is an input string.

 

          And suppose that this TM stops for the halting combination of   Qi   Sj

          In this case, for the resulting S.T.P, we will have the following derivation:

 

                                                                             +

 h  u  Qi   Sj  v  h  ->   h  u  Qn+1   Sj  v  h       ->

                                                 

                                              h  u  Qn+1   h           ->

                                                                    *

                                              h  u  Q0      h  ->

 

                                              h  Q0   h 

 

THM2.2 (Page-175) (Correspondence between an accepted string and a derivation sequence

by the resulting S.T.P leading to our goal string  h Q0  h, a sufficient and necessary

condition for an accepted string)

 

An NDTM accepts an input string U (on its tape alphabet, of course) if and only

if there is a derivation sequence by the resulting   S(M)  from “h Q1  B  U h” to  “h  Q0  h”

 

Proof:

          1. The necessary condition

    Above example shows the necessary condition for an accepted string.

              That is, if a string U is accepted by the TM, there is going to be a derivation

              sequence  from “h Q1  B  U h” to  “h  Q0  h”

          2. The sufficient condition

              On the other hand, if a string is not accepted, then starting from the initial Post word

              h Q1  B  U h” any Post word derivable will contain exactly one Qi   (for I=1,2, …, n)

              and does not contain the state Qn+1  nor  Q0   because no Post word derivable will

              contain a halting combination of the (current) state and the (current) tape symbol.

             Hence, the Post word   h  Q0  h” will not be derived.

 

THM2.3, page-176 (THM2.2 in Inverse Process,   called W(M))

A NDTM M accepts a string U if and only if there is a derivation sequence from

the Post word “h  Q0  h”  to a Post word  “h Q1  B  U h”.

 

Proof: This is an immediate consequence of the THM2.2 above as for any given S.T.P, P,  

there is a derivation from a string U to a string V by P  if and only

if there is a derivation from the string V back to the string U by its Inverse Process.    

 

THM3.1, page-176 (Existence of a TM with unsolvable Word Problem for both

its  S(M) and its W(M))

 

Proof:  Note that the Word Problem for S(M) for any given TM M is solvable

if and only if it is solvable for its W(M).

Let us pick any TM that accepts a non-recursive RE set, say S. 

This RE set S has an undecidable membership problem.

Now,  a string U is in the set S  <->

                                                    *

           h  Q1 B U h  -> h Q0 h

                               S(M)

                            

 

THM2.1, page-174 (Unique derivation sequence for the  S(M) for any DTM, M)

For any DTM M, we have at most one possible derivation sequence of Post words

from the initial Post word, “h Q1 B U h” for a given arbitrary input string U

to any given Post word.

 

THM3.4, page-178 (Existence of a S.T.P on a two-symbol alphabet

with an Unsolvable Word Problem)

There is a S.T.P on the alphabet {a,b} whose Word problem is unsolvable.

Moreover, for each production, g -> h, both g and h are non-empty.

 

Key to Proof.

1.           Consider a TM M whose S(M) has an unsolvable Word problem.

2.           The RHS and LHS of each production of this S.T.P are each non-empty.

3.           Replace every symbol of this S.T.P with a string of the form:   abbbbbba such that

          the number of b’s uniquely identifies the symbol of the S.T.P.

4.           Construct a new S.T.P from S(M) by replacing each current production by one we obtain

          after replacing each current symbol by the string obtained in the step (3) above.

5.           The resulting new S.T.P will have an unsolvable Word problem just as the current S.T.P

          Has one.        

QED

 

 

EX:      S1  S2   ->  S2  S1                // abaabba -> abbaaba

          S1 S4 S5 -> S5 S4 S1             // abaabbbbaabbbbba -> abbbbbaabbbbaaba

          S2 S3  ->  S1 S2 S3                 // abbaabbba -> abaabbaabbba

 

 

Definition : Thue Process

A S.T.P is called a Thue process if it contains the inverse of every production in it.

 

          EX:  Thue Process

 

                   aba ->  abba         abba -> aba

                   bbaa -> abb abb -> bbaa

                   bab ->  baab         baab -> bab

                   aa -> aa

 

                   Above process is a T.P. as well as a S.T.P

 

Definition:  For an given TM, M,   we define a T.P

q(M) =  S(M) +  W (M)

 

THM3.2, page-177 (Post’s Lemma)

(Equivalence between  S (M) and  q (M) as far as the existence of a derivation starting

from an arbitrary initial Post word is concerned for any deterministic M)

Let M be a DTM and let U be a word (string) on its tape alphabet.

                                       *

Then,  h   Q1   B  U   h   ->   h  Q0  h      if and only if

                                   S(M) 

                                                            *

           h   Q1   B  U   h   ->   h  Q0  h     

                                   q(M)

 

Note that above equivalence holds if M is deterministic.

That is, it may not hold when M is not deterministic.

When M is deterministic, the THM implies that as far as such derivations are concerned, 

q(M) is no more powerful than S(M)

 

 

THM4.1, page-182 (Unsolvability of Post Correspondence Problem)

Given any arbitrary Post correspondence system, it is unsolvable to determine

whether or not it has a solution.

That is, there is no algorithm for determining whether the given system has a solution.

          EX: Post correspondence system

          Assume that A(alphabet) = {a,b,c}

 

                   Set-A                    Set-B

                   -------                    -------

                   aba                       ab

                   abbb                     bbaa

                   baba                     abab

                   b                           ab

 

                   Above system has a solution, “abab” as it can be constructed by the same

                   sequence of substrings from both the sets of the system as follows:

                   From Set-A,      abab =  aba  + b

                   From Set-B,         abab =  ab   + ab

 

                   Note that if we remove the last pair of substrings, the resulting system will not

                   have a solution.

This is because we note that no pair has the same symbol

as the last symbol of the two substrings of the pair.

 

 

While above system has been found to have a solution, the THM implies that, in general,

we cannot decide whether an arbitrary P.C. system has a solution.

 

Definition: Grammar                               

 

A grammar has four components:

1.     The set, T,  of terminal symbols

2.     The set, N,  of non-terminal symbols

3.     The set, P,  of production rules

4.     A unique non-terminal symbol designated as the Start symbol.

 

EX: A grammar

           

             T =  {a,b,c}

             N = {S,B,C}

   P =  {      

          S ->  a  S B  C

S ->  a  b  C

          C  B -> B  C

          b  B -> b  b

          b  C -> b  c

          c  C -> c  c}

   S = The Start symbol.

 

   Strings (Sentences) consisting only of terminals that can be derived by this grammar

   constitute the language defined by this grammar, L(G).

                                             +  

   Formally,  L(G) = { u | S -> u & u is in T* }      

 

   L(G) = {am bm cm   |  m >=1} = {abc, aabbcc, aaabbbccc, aaaabbbbcccc, … }

    abc:                  S -> a b C -> a b c

    aabbcc:   S -> a S B C

                      -> a a b C B C

                      -> a a b B C C

                      -> a a b b C C

                      -> a a b b c C

                      -> a a b b c c

    Strings not in this language:

                   a  a  b c b c                     (No derivation)

                   a  a  b c B C                   (Derivation)

                   a  a  b  b  c  C                 (Derivation)

 

Types of Grammar

We classify grammars according to the restrictions imposed on production rules.

          Unrestricted (Phrase Structured Grammars): Type-0

          Context-Sensitive Grammars (CSG):Type-1

          Context-Free Grammars (CFG):Type-2

          Regular Grammars:Type-3     

1.     No restriction except that the LHS of each rule must have at least one non-terminal.

2.     The length of RHS is at least as large as that of LHS.

3.     The LHS has just one non-terminal and no terminal at all.

The RHS can be empty.

4.     In addition to being CFG, they have additional restrictions that

   RHS = just one terminal and no non-terminal or

               one terminal followed by one non-terminal.

 

          Above grammar is not Regular and not a CFG but a CSG which is also Unrestricted.

           

THM7.5.1(Page-186)

(Existence of a grammar equivalent to an arbitrary NTM, M)

Given an arbitrary NTM, M, we have a grammar G such that

L(M) = L(G), that is, the language accepted by this M is the same as the language

generated by this grammar.

Proof:

This amounts to identifying the grammar.

   The production rules of this grammar include:

          1. All productions of  W(M).

          2. In addition, the following rules:

              a.   S -> h Q0 h

              b.   h  Q1  B -> Q

              c.   Q Si   -> Si Q  , for all i=1,2, ..., K

              d.   Q h ->  l.

    The terminal set, T, of this grammar is: T = {S1,S2, ..., SK}

    The non-terminal set, N, of this grammar includes all other symbols including:  

          S, Q1, Q2, ..., Qn,  Q0, Qn+1 ,   Q, h

    And,  S is the Start symbol.

 

It is the case that M accepts an input string U if and only if

there is a derivation from “h Q1 B U h” to “h Q0 h” by  S(M) which is the case

if and only if there is a derivation from “h Q0 h” to “h Q1 B U h” by  W(M)

which is the case if and only if there is a derivation:

                                          +                                             *

          S -> h Q0 h -> h Q1 B U h  -> Q U h -> U Q h -> U  by the grammar.  PM!!!

 

THM7.5.2, page-188

(Equivalence between Grammars and RE sets)

Proof:

          1. An RE set -> DTM -> NDTM -> A grammar that generates the RE set.

          2. A grammar G -> the language generated by the grammar is an RE set.

                                                                +

              L(G) = {U | S -> U & U in T*}

                                        G

  +           

        = {U | S -> U} ^ T*, which is the intersection of an RE set and a

                       G                  Recursive set which is RE.

                                     

 

                                        +

           Note that  (S -> U) is the same as (there exists y such that

                                       G

DERIV(U,y)) which is  the same as  the question of whether or not the following function

is defined:         

min DERIV(U,y)

                                                   y

           which is an unbounded minimalization on a PR predicate

which is a PC function g(U)

                                      as DERIV(x,y) is a PR predicate.

That is, whether or not U is derivable from S is the same as whether or not

a certain unary PC function G(U) is defined.

 

            Note also that if the grammar is CS above unbounded minimalization becomes

            a bounded quantification on a PR predicate instead as the maximum length of

  possible derivation sequences is bounded and values of possible strings in any

  derivation sequence also is bounded , which results in a PR Predicate.

  So in this case, the question of whether or not an arbitrary string is derivable from

  the start symbol S now is computable, making the set of all derivable strings

  Recursive.

  So, the language defined by any CS grammar is Recursive: THM5.4, page 190.

 

Application of PCP

 

   THM: (Unsolvability of CFG (Context-Free Grammar) Ambiguity problem)

   Given an arbitrary CFG, it is unsolvable to decide whether or not it is ambiguous.

 

  Proof:

We can reduce an arbitrary PCP system to an equivalent CFG such that PCP is solvable

if and only if the CFG is ambiguous.

  Let an arbitrary PCP system have two sets of corresponding substring pairs,

          w1 ,  w2 ,  w3  , …,  wn   .   

          x1 ,   x2 ,   x3  , …,   xn

 

  Now, consider  the following n symbols not in the alphabet:

a1 ,  a2 ,   a3  , …,  an   

 

   We construct a CFG from above pair of lists of substrings as follows:

          S -> SA | SB

          SA -> wi    SA   ai  , for i =1,2, …, n

          SA -> wi    ai          , for i =1,2, …, n

          SB -> xi    SB   ai    , for i =1,2, …, n

          SB -> xi     ai          , for i =1,2, …, n

         

          For this above, G,

          L(G) =  { xi1 xi2xik   aik   … ai2  a i1   ,

                             wj1 wj2wmj  amj   … aj2  a j1  }

                          The first kind of strings are to be derived from the non-terminal SB and

                      the second of strings from the non-terminal SA .

          A string of the first kind and a string of the second kind can be the same when the two

          non-terminals SA and SB do derive the same string, which makes the grammar

          ambiguous. And the grammar is ambiguous exactly when the given PCP system has

          a solution.                          

 

 EX:

          PCP System:

            ab    abb

            aa   baa

            bb    b

           

            bbaa is a solution

 

          Corresponding CFG:

          S -> SA

          S -> SB                // These two production rules can be written:

   S -> SA | SB

          SA -> ab SA c1 | aa SA c2 | bb SA c3 | ab c1 | aa c2 | bb c3

          SB -> abb SB c1 | baa SB c2 | b SB c3 | abb c1 | baa c2 | b c3

          A string made up of this solution in part can be derived in two different derivation

          sequences making the grammar ambiguous as follows:

 

          S -> SA -> bb SA c3 -> bb aa c2 c3

          S -> SB -> b SB c3   -> b baa c2 c3

 

Some Unsolvable Problems on Grammars                 

 

 

1. Consider a TM M that accepts an RE set that is not Recursive and an arbitrary input U.

2. Consider a grammar GU :

          The nonterminals: The entire alphabet of  S(M), S, V

The terminals:       a  (only)

Production rules:

          All of  S(M) and

          S -> h Q1 B U h

          h Q0 h -> V

          V -> a V

          V -> a

Note that this grammar depends not only on a TM but also on a possible input

String.   

3. M accepts U  <->  L(GU ) = { am | m>=1}

                   <->  L(GU ) <> The Empty-set

                   <->  L(GU ) is infinite

                   <->   “a” is in L(GU )

 

Above construction of  GU  leads to:

 

THM 6.1, page-192

Given a grammar G, there is no algorithm for deciding whether or not

1. L(G) is empty  (Grammar Emptiness Problem)

          2. L(G) is infinite (Grammar Infinity Problem)

          3. an arbitrary string is in L(G) (Membership Problem)

 

THM 6.2, page-192 (Grammar Equivalence Problem and Grammar Subset Problem)

Given two grammars, G1 and G2, there is no algorithm for deciding whether

          1.   L(G1)  =  L(G2)

          2.   L(G1)  <=  L(G2)

Proof:

          1. Let  L(G1)= { am  | m>=1}

              That is: G1 has following two production rules:

                   S  ->  a S

                   S  ->  a

          2. Let  G2  be above grammar constructed from an arbitrary TM and an arbitrary input

              string.

          3. M accepts an input string U  <-> L(G2) = L(G1)

                                                 <-> L(G2) <= L(G1)

 

Note that Grammar Proper Subset Problem is also unsolvable:

          L(G2)  <  L(G1)

          when  G1     has following production rules:

                   S -> a S

                   S ->  l  (empty-string)