Primitive Pattern structures (functions/variables) 10/29/2002 1. span(Nonempty-argument) span(" "): A run of blanks (max possible length) span("0123456789"): A string only of digits span("ABCDEFGHIJKLMNOPQRSTUVWXYZ"): A word (A run of uppercase letters) a. matches the longest string beginning at the current cursor position that consists solely of characters of the argument b. Argument must be nonnull. c. It must match at least one character or else it fails. 2. break(Nonempty-argument) break(" "): Every thing up to but not including the next blank. break(",.;:!?"): Every thing up to the next punctuation mark. break("+-0123456789"): Every thing up to the next number. a. matches the longest string beginning at the current cursor position that does not include a character of the argument. b. The argument must be nonempty. c. must find a break character in the subject or else it fails. d. matches the null string if the subject character at the beginning cursor position is one of break characters. "Lamar University" Break(" LU") //Null string matched 3. fail A variable containing a primitive pattern that always fails to match. Causes the scanner to back up to seek alternatives. &Anchor = 1; &Trim = 1 Pat = (Break("A") $ A | Break("B") $ B | Break("C") $ C) fail Loop String = " " input :F(End) A = B = C = String Pat (Differ(A) Differ(B) Ident(C)) :F(Loop) * Else the input string has at least one A and one B but no C. String span(" ") = ;* Drop leading blanks. output = String :(Loop) 4. POS(N) and RPOS(N) a. Both match only the null string and used to identify the position of matched substring in a subject. b. They never cause the cursor to be moved, they just test its position. &Anchor = 1 STR Span(" ") POS(6) ;* to check if cursor position is just past ;* the 6th character of subject string. Above pattern matching succeeds only if subject string has exactly 6 leading blanks &Anchor = 1 STR Span(" ") RPOS(6) Above pattern matching succeeds if and only if the 6th character from the end of the subject string is a non-blank and everything preceding it is a blank like " 654321" Entire = POS(0) Pat RPOS(0) STR Entire // Above Pattern matching succeeds if and only if the entire subject string does match 5. Tab(N), Rtab(N), Rem a. Each of these matches a substring of zero or more characters. "SNOBOL4" Len(2) Tab(6) In above example, Len(2) matches the first two chars of the subject string, "SN", and Tab(6) matches the next four chars, "OBOL", as Tab(N) generally matches a substring up to and including Nth char of the subject starting from the current cursor position. "SNOBOL4" Len(2) Rtab(2) In above example, Len(2) matches the first two chars of the subject string and Rtab(2) matches everything but the last two chars. And hence it matches the three chars, "OBO". Rtab(0) is particularly useful for matching everything to the end of the subject string. REM is used for Rtab(0) Last8 = Rtab(8) Rem . LastEight LastEight picks up the last eight chars of a subject string if it is at least that long. 6. ANY(Nonempty-argument), NotAny(Nonempty-argument) a. They each matches exactly one char. b. ANY() matches one of chars of the argument while NotAny() matches a char not in the argument. c. ANY("AIOEU") is equivalent to "A" | "I" | "O" | "E" | "U" But, it is faster than the latter. d. SNOBOL identifiers and labels can be defined as follows: Letters = "abcdefghijklmnopqrstuvwxyz" Letters = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" Letters Digits = "0123456789" Alphanum = Letters Digits Identifier = Any(Letters) (Null | Span(Alphanum "_.")) Label = Any(Alphanum) break(" ;") // A label starts with a letter or with a digit followed by anything up to but not including a blank or a semicolon. 7. ARB a. A pattern variable that matches zero or more chars. b. It can be defined as: Arb = Null | len(1) *arb c. So, it is composed of Null Len(1) Null Len(1) Len(1) Null Len(1) Len(1) Len(1) Null Len(1) Len(1) Len(1) Len(1) Null and so on. d. It will match all above in that order until it fails simply because the size of subject substring cannot be increased. e. It should not be used as the first component of a pattern structure unless associated with a variable for value assignment. Str Arb Pat // Not a good idea Str Pat // Better idea and the same if unanchored. f. It should not be used to break fields out of a string if they are separated by known delimiters. Str break(",") . Field "," = Str Arb . Field "," = Above are the same accomplishing the same. But the former is much faster. 8. ArbNo(Patt): "Arbitrary number of" of Patt a. This pattern function matches zero or more consecutive occurrences of strings each of which matches the argument pattern. b. When encountered by the scanner in the forward direction, i.e., initially, it matches the Null string. c. When "backed into," it tries to increase the length of the substring matched by its argument. &Anchor = 1 STR ArbNo(Len(3)) RPOS(0) Above matching succeeds only if the length of STR is zero or a multiple of three. d. It can be defined as follows: ArbNoPatt = Null | Patt *ArbNoPatt e. Hence, it will match the following in that order: Null Patt Null Patt Patt Null Patt Patt Patt Null Patt Patt Patt Patt Null // and so on. Example: &Anchor = 1 Patt = '123' | '1234' | '23' | '341' | '412' Test = ArbNo(Patt) $ Output RPOS(0) '123412341223' Test Above will produce the following eight output lines the first of which is a blank line (for the Null string matched): 123 123412 123412341 1234 1234123 1234123412 123412341223 f. ARBNO is relatively slow and should be avoided if some other pattern suffices like: Span(" ") or Null | Span(" ") instead of ArbNo(" ") 9. Bal a. A pattern function that matches initially the shortest non-empty substring balanced with respect to parentheses. b. Example: BalTest = Bal $ Output Fail &Anchor = 0 ;* Default "(((A+B*C)*D))" BalTest Above will produce the following: (((A+B*C)*D)) ((A+B*C)*D) (A+B*C) (A+B*C)* (A+B*C)*D A A+ A+B A+B* A+B*C + +B +B* +B*C B B* B*C * *C C * *D D 10. Cursor Position Operator (Unary @): @X a. The value of @X is a pattern structure that matches the null string and assigns the current cursor position as an integer value of the variable X. This assignment of the cursor position to the operand of the @ operator takes place as immediate value assignment. Value is assigned when the cursor position operator is encountered during pattern matching, not necessarily following successful completion. Example: &Anchor = 0 STR = 'TEST$AT$OPERATOR' STR @Head "AT" @Tail In above example, Head will be 5 and Tail will be 7 as they each matches the null string at position #5 and #7, respectively.