COSC4307 Program-3 Due: 10/8/2009 Scanner Writing Consider the following Block-Structured CFG: PASCAL -> PROGRAM Block . Block -> BEGIN Decl : Body END Block -> BEGIN Body END Decl -> INTEGER Ilist Decl -> REAL Ilist Decl -> INTEGER Ilist : Decl Decl -> REAL Ilist : Decl Body -> Body ; S Body -> S S -> Block S -> PRINT Elist S -> READ Ilist S -> Id = E Ilist -> Ilist , Id Ilist -> Id Elist -> Elist , E Elist -> E E -> E + T E -> T T -> T * F T -> F F -> Const F -> Id F -> ( E ) NOTE: "Id" Stands for an identifier which is a letter followed by zero or more letters and/or digits, and "Const" an unsigned integer constant. Task: Write a scanner for the above grammar that does the following: 1. Read an input source program. 2. Number expected tokens as follows: a. Keywords (Terminal symbols that consist of two or more letters like PROGRAM, BEGIN, END, PRINT, READ, INTEGER, etc) will numbered 1, 2, 3, 4, 5, 6, etc. b. Operators/Delimiters will be numbered 201, 202, 203, etc. c. Identifiers will be 1ddd where 'ddd' is the position of the symbol table occupied by each id. d. Constants will be 2ddd where 'ddd' is the position of the same symbol table occupied by each constant. 3. Output token integers, and 4. Output three kinds of errors with the indication of the error type and the error location: a. Unexpected tokens. (You may find the NEXT matrix very useful for this purpose) b. Undeclared identifiers. This error message must be output only once for each such error, and this can be done by repairing the error. (We repair each undeclared id by declaring it as an integer variable in the main block.) c. Duplicate Declares. The same name (identifier) being declared more than once in the same block (or subprogram). All subsequent declarations after the very FIRST one will be ignored. Simplifying assumtions: 1. Keywords are strictly reserved. 2. The static scope rule is used in determining the block in which an identifier is declared, and hence also in detecting an undec- lared identifier, if any. 3. In each block, declarations must appear first. Hence, later declarations may be all ignored. 4. One or more consecutive blanks are to be taken as indicating the end of a token, although a token may end by something else. 5. Each undeclared identifier will be repaired by declaring it as an integer variable in the main block. 6. Only one symbol table will be used which will contain both identifiers and constants. 7. Source programs are case-insensitive. Symbol table structure: 1. First column: The symbols themselves 2. Second column: Symbol length 3. Third column: Symbol attributes: 1: Constants 2: Scalar variable 3: Array variables 4: Array element references 5: Temporary variables (or locations) 6: Others 4. 4th column: Variable types: 1: Integer 2: Real 5. 5th column: Pointer to the symbol table location of the next symbol in the same block. 6. 6th column: Pointer to location(s) of the local data area to contain value(s) of the symbol. Input source program: PROGRAM BEGIN REAL A, B, Y123; INTEGER LongVariab, A, B, C; Read LongVariab, A, B, Y12; Y123 = Y12 + LongVariab + A + B; C = Y123+Y12*Y12; Print A, B+C+D+Und1; BEGIN REAL LongVariab, A, B: Integer A: READ LongVariab, A, B, ABC, XYZ; X123 = ABC + XYZ*Und1*Und2*A*C; END ; BEGIN Real ABC123, X, Y : Integer X, Y, Und1; Read B, C, Abc123, X, Y, Und1, Und2, Und3, ABC; X123 = ABC+Und1+Und2+Und3*A*C End READ Y, X, LongVariab, Und1, Und3, Und4, ABC123; X = X + Y * LongVariab * ABC123; End .