RIGAL.DOC 23 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505
  1. Programming language RIGAL as a compiler writing tool
  2. M. I. AUGUSTON
  3. Computing Center of the Latvia State University
  4. 226250, Riga, Rainis boulevard 29,
  5. Latvia, USSR
  6. Abstract. A new programming language for compiler writing is
  7. described briefly in this paper. The main data structures are
  8. atoms, lists and trees. The control structures are based on a
  9. advanced pattern matching. All phases of compilation,
  10. including parsing, optimization and code generation, can be
  11. programmed in this language in short and readable form.
  12. 1. Introduction
  13. Programming language RIGAL is intended to be a tool for
  14. syntactic analysis, program optimization, code generation and for
  15. preprocessor and convertor writing. The main principles of RIGAL
  16. are the following:
  17. a) formal grammars are used in the pattern matching mode;
  18. b) operations, such as tables formation, code generation,
  19. message output are executed simultaneously with parsing, like in
  20. YACC [1], CDL-2 [2]. Therefore, the principle of dependence of
  21. control structures in the program upon data structures used for
  22. program's work is well supported;
  23. c) attribute grammars can be modeled easily in RIGAL. The
  24. language has reference facility, which is a good solution of
  25. global attribute problem;
  26. d) the language has rich spectrum of tree manipulation
  27. facilities, like in Vienna Definition Language [3], including tree
  28. grammars. Trees can be conveniently used as data structure for
  29. different table implementation;
  30. e) the language supports multi pass compiler design, as well.
  31. Trees can be used as intermediate data;
  32. f) the language provides the split of program into small
  33. modules (rules) and presents various means to arrange interaction
  34. of these modules;
  35. g) RIGAL is a closed language, i.e. all the necessary
  36. computations and input-output could be executed by internal means
  37. and there is no need to use external semantic subroutines which
  38. are written in Pascal or C. Therefore, the portability of RIGAL
  39. programs to other computers is increased.
  40. Detailed presentation of language, as well as more examples
  41. are given in [4].
  42. 2. Data Structures. Operations
  43. The only data structures in RIGAL are atoms, lists and trees.
  44. The atom is a string of characters or a number. Identifiers
  45. and numbers can be written just in the RIGAL program text, other
  46. atoms must be quoted. Special atom NULL represents an empty list,
  47. an empty tree and Boolean value "false".
  48. Variable name begins with symbol $. Values can be assigned to
  49. variables. For example, $E := ABC assigns atom ABC to variable $E.
  50. The list is an ordered sequence of objects - atoms,
  51. another lists and trees.
  52. The list constructor yields a list of objects. For example,
  53. $E := (. A B C .) assigns a list of atoms A, B and C to the
  54. variable $E . It is possible to get elements from a list by
  55. indexing, e.g. $E[ 2] is atom B, but $E[ -1] is atom C. Operation
  56. L !. e appends element e to the end of list L. Two lists L1 and L2
  57. can be concatenated by operation L1 !! L2 .
  58. For tree creation the tree constructor is used, e.g.
  59. <. A : B, C : D .>
  60. Objects placed just before ':' are called selectors. A
  61. selector may be any atom, except NULL. Selectors of the same level
  62. must be different. The pair 'selector : object' in the tree is a
  63. branch. Branches in the tree are unordered.
  64. Statement $E := <. A : X, B :(.3 7.), C :<. A : 2 .> .>
  65. assigns a tree with three branches to variable $E. It is
  66. possible to extract components from the tree by means of selection
  67. operation . For example, $E.A is atom X, $E.C.A is atom 2, and
  68. $E.B[2] is atom 7. But $E.D yields NULL, because there is no
  69. branch with selector D in this tree.
  70. Operation T1 ++ T2 appends branches of tree T2 to tree T1. If
  71. T1 has a branch with the same selector, this branch is replaced by
  72. the branch from the T2.
  73. The equality of objects can be checked using operations =
  74. and <>. The result is atom T ('true') or NULL ('false').
  75. Common arithmetic and relational operations +, -, *, div,
  76. mod, <, >, <= and >= are defined for numerical atoms .
  77. The arguments of logical operations and, or and not can be
  78. arbitrary objects. If the object differs from NULL, it represents
  79. the 'true' value.
  80. It is possible to write $X !.:= e instead of statement
  81. $X := $X !. e .The same short form is allowed for operations !!,
  82. ++ and + .
  83. 3. Rules. Simple Patterns
  84. Rules are used basically to check if an object or sequence of
  85. objects complies with some grammar.
  86. The grammar is described by the patterns. A rule call should
  87. be accomplished by some object or object sequence called rule
  88. arguments . They are matched with the patterns in the rule. Some
  89. statements ( assignments, input-output operations ) can be
  90. executed simultaneously with pattern matching .
  91. Rule can compute and return some value to the calling point.
  92. Thus, the concept of rule is analogous to concept of procedure and
  93. function in conventional languages.
  94. The rule execution ends with success or failure depending on
  95. the result of argument and pattern matching. That is why a rule
  96. can be used as a pattern in another rule.
  97. Example of a rule definition: #L1 A B ##
  98. Here the pattern contains atoms A and B. Rule #L1 can be
  99. called, for example, in such a way: #L1(A B). Atoms A and B are
  100. given as arguments. In this case argument matching with pattern is
  101. successful.
  102. Call #L1(A C) fails, because the second argument fails to
  103. match the second pattern, but #L1(A B C) ends with success, as the
  104. third argument is not requested by any pattern element.
  105. Variable can be a pattern element. It matches successfully
  106. just one object from the argument sequence and, as a side effect,
  107. this object is assigned to the variable.
  108. Statements can be placed before the pattern element and after
  109. it. The statement group is closed in symbols '/'. Statements in
  110. the group are separated by semicolons.
  111. The return statement in the rule ends the rule execution and
  112. yields the returned value.
  113. Example. #L2 $A $B / return (. $B $A.)/ ##
  114. Variable $X gets value (.2 1.) after statement $X:= #L2(1 2)
  115. is executed .
  116. If return statement is not described in the rule, then the
  117. NULL value is returned.
  118. The rule pattern element can refer to some rule
  119. (recursively, as well).
  120. Example.
  121. #L3 B C ##
  122. #L4 A #L3 D ##
  123. Call #L4( A B C D ) is successful, but #L4( A E F D ) is
  124. unsuccessful.
  125. Every pattern element, when successfully matched with
  126. corresponding rule argument, returns some value. Value of atom
  127. pattern coincides with this atom, value of variable pattern
  128. coincides with the value obtained by this variable as a result of
  129. matching with the argument. The value of rule is formed by the
  130. return statement. Value, returned by the pattern element can be
  131. assigned to a variable.
  132. Example.
  133. #L5 $R !.:= $A $RN!!:= #L6 / RETURN $R/ ##
  134. #L6 $M !.:= $B $M !.:= $C / RETURN $M/ ##
  135. When statement $X := #L5( 1 2 3 ) is executed, variable $X
  136. gets value (. 1 2 3 .). All variables in the rule are initialized
  137. by NULL. Then the first application of pattern element $R !.:= $A
  138. in #L5 assigns to variable $R the value of expression NULL !. $A
  139. that equals to (. $A .) .
  140. There are some built-in rules, such as #NUMBER or #IDENT.
  141. Patterns of the type $Id := #IDENT and $N := #NUMBER are used so
  142. often, that a default is introduced. If the name of the pattern
  143. variable begins with letter "I", then it matches atoms -
  144. identifiers, and, if it begins with letter "N", then it matches
  145. numerical atoms.
  146. One rule can contain several groups of patterns. They are
  147. called branches of the rule. Arguments of the rule are matched
  148. with branches in succession, until one branch succeeds.
  149. Branches in the rule are delimited by symbol ';;'. In case of
  150. branch failure, some statements can be executed. Such statements
  151. can be written at the end of the branch after keyword ONFAIL. In
  152. case of branch failure, control is transferred to them and
  153. statements,given in ONFAIL-unit , are executed. So, in case of
  154. branch failure, causes of failure could be analyzed and message
  155. could be output.
  156. 4. Patterns
  157. List pattern (. P1 P2 ... Pn .) matches only those objects,
  158. which are lists and elements of which match patterns P1, P2, ...
  159. and Pn respectively.
  160. There are patterns for alternative
  161. ( P1 ! P2 ! ... ! Pn )
  162. and for option
  163. [ P1 P2 ... Pn ]
  164. with usual semantics.
  165. Iterative sequence pattern (* P1 P2 ... Pn *) denotes the
  166. repetition of pattern sequence zero or more times. It is possible
  167. to define rules with a variable number of arguments.
  168. Example.
  169. #Sum (* $S +:= $Num *) / RETURN $S/ ##
  170. Call #Sum(2 5 11) returns 18.
  171. Example. Rule that determines length of the list can be
  172. defined as follows:
  173. #Len (. (* $E / $L +:= 1 / *) .) / RETURN $L/ ##
  174. Call #Len( (. A B C .) ) returns 3.
  175. Iterative sequence pattern (+ ... +) is similar to (* ... *),
  176. but the former denotes the repetition one or more times.
  177. It is possible to describe iterative sequence patterns with
  178. delimiters in such form: (* P1 P2 ... Pn * delimiter ) . An atom
  179. or a rule name can be used as delimiter .
  180. Example. Analysis of a simple Algol-like declaration. A
  181. fragment of variable table coded in a tree form is returned as a
  182. result.
  183. #Declaration
  184. $Type := ( integer ! real )
  185. (+ $Id / $Rez ++:= <. $Id : $Type .> / + ',')
  186. / RETURN $Rez / ##
  187. Call #Declaration ( real X ',' Y ) returns value
  188. <. X : real, Y : real .>
  189. Example. Simple arithmetic expression parsing. When
  190. successful, an expression tree is returned, which can be regarded
  191. as an intermediate form for the next compilation phases.
  192. #Expression
  193. $A1 := #Additive_el
  194. (* $Op := ( '+' ! '-' ) $A2 := #Additive_el
  195. / $A1 := <. op : $Op , arg1 : $A1 , arg2 : $A2 .> / *)
  196. / RETURN $A1 / ##
  197. #Additive_el
  198. $A1 := #Term
  199. (* $Op := ( '*' ! 'div' ) $A2 := #Term
  200. / $A1 := <. op : $Op, arg1 : $A1, arg2 : $A2 .> / *)
  201. / RETURN $A1 / ##
  202. #Term
  203. $A := ( $Id ! $Num ) / RETURN $A / ;;
  204. '(' $A := #Expression ')' / RETURN $A / ##
  205. Call #Expression( X '*' Y '+' 7 ) returns value
  206. <. op: '+', arg1: <. op: '*', arg1: X,
  207. arg2: Y .>,
  208. arg2: 7 .>
  209. 5. Tree Patterns
  210. Tree pattern can be written in the form
  211. <. a1 : p1, a2 : p2, ... , an : pn .>
  212. where ai are atoms and pi are patterns.
  213. Tree pattern branches are applied to corresponding branches
  214. of the argument tree in the same order as they are written in the
  215. pattern.
  216. Therefore,the order of tree traversing may be controlled. We
  217. can traverse some branches of the tree repeatedly, if the
  218. corresponding selector in the tree pattern is repeated, or we can
  219. omit some branches of the tree, if the corresponding selectors are
  220. omitted in the pattern.
  221. Optional branches in tree pattern are enclosed in brackets
  222. '[' and ']'.
  223. Example. Let us suppose expression tree to be formed like in
  224. the above mentioned example. The task is to traverse the tree and
  225. return a list that represents the Polish postfix form of this
  226. expression.
  227. #Postfix_form
  228. <. arg1: $Rez := #Postfix_form,
  229. arg2: $Rez !!:= #Postfix_form,
  230. op: $Rez !.:= $Op .> / RETURN $Rez / ;;
  231. $Rez := ( $Id ! $Num ) / RETURN (. $Rez .) /
  232. ##
  233. Call #Postfix_form( <. op: '-', arg1: X, arg2: <. op: '*',
  234. arg1: Y, arg2: 5 .> .>) returns value (. X Y 5 '*' '-' .)
  235. Iterative tree pattern has the form <* $V : P *> , where $V
  236. is a variable and P - pattern. This pattern defines a loop over
  237. the tree. All selectors of the argument tree are assigned to
  238. variable $V one by one. Pattern P is applied to each object, which
  239. corresponds in the argument tree to the current selector in
  240. variable $V.
  241. Example. Traversing the variable table.
  242. #Var_table
  243. <* $Id : $E := ( integer ! real )
  244. / $R !.:= (. $Id $E .) / *> / RETURN $R / ##
  245. Call #Var_table( <. X : integer, Y : real, Z : real .>)
  246. returns value (. (. X integer .) (. Y real .) (. Z real .) .)
  247. 6. Pattern of Logical Condition Testing
  248. Pattern S'( expression) is performed as follows. First of all
  249. the expression is evaluated, and iff its value is different from
  250. NULL then matching with the argument is successful. Special
  251. variable $$ in the expression denotes current argument, to which
  252. this pattern is applied. Using this kind of pattern many
  253. context-sensitive situations can be described. For example, by
  254. such pattern we can recognize a special case of assignment
  255. statement "X := X + E"
  256. $Id ':=' S'( $$ = $Id ) '+' #Expression
  257. We can skip tokens, until semicolon using pattern
  258. (* S'( $$ <> ';' ) *)
  259. 7. Statements
  260. There is an analog of McCarthy's conditional statement in
  261. RIGAL. It has the following form :
  262. IF cond -> stmts
  263. ELSIF cond -> stmts
  264. ... ... ...
  265. FI
  266. If the value of conditional expression is not equal to NULL ,
  267. then corresponding statements are executed.
  268. The FAIL statement ends the execution of the rule branch with
  269. failure.
  270. Loop statement
  271. FORALL $V IN Expr DO stmts OD
  272. iterates its body over a list or a tree described by expression
  273. Expr. All elements of the list or all selectors of the tree are
  274. assigned to variable $V in succession .
  275. Loop statement of the type
  276. LOOP stmts END
  277. repeats statements of the loop body, until one of the statements -
  278. BREAK, RETURN or FAIL is not executed.
  279. Objects designed by RIGAL program can be saved on disc and
  280. loaded back by statements SAVE and LOAD .
  281. Text files can be opened by OPEN statement for text output of
  282. messages, generated code, etc.
  283. Sequence of atoms can be output in text file FFF by the
  284. statement of the form
  285. FFF << expr1 expr2 ... ... ... exprN
  286. A blank is output after each atom. Symbol @ cancels the
  287. additional blank symbol output.
  288. Statement '<<' always begins output on a new line, but '<]'
  289. statement continues output on current line.
  290. 8. Program Structure
  291. RIGAL program consists of main program and rules.
  292. Operations like initial object loading from external files,
  293. rule calls and created object saving on external files are
  294. concentrated in the main program .
  295. When RIGAL is used for parsing, first the input text has to
  296. be processed by scanner - a program, which converts the text into
  297. a list of tokens. This list is an object of RIGAL, and can be
  298. loaded and processed by RIGAL program.
  299. Intermediate results, e.g., abstract syntactic trees can be
  300. saved, and another RIGAL program can load and update them.
  301. 9. Attribute Grammars and RIGAL. Global References
  302. There is a close analogy between attribute grammar and RIGAL
  303. program. Rules in RIGAL correspond to grammar nonterminals, and
  304. variables - to attributes.
  305. In attribute grammars the greatest part of attributes are
  306. used as transit attributes. To avoid this, global attributes are
  307. introduced in the attribute grammar implementations.
  308. There is a good conformity between data structures and
  309. control structures in RIGAL program. Hence, it is possible to use
  310. the control structures of RIGAL program, in order to refer to the
  311. data elements.
  312. Rule variables can be accessed from some other rule using
  313. reference of the type :
  314. LAST #L $X
  315. This reference denotes the value of variable $X in the last
  316. (in time) and still active instance of rule #L. Such references
  317. can be used in left and right sides of assignment statement.
  318. Therefore, implementation of both synthesized and inherited
  319. attributes is possible as it is demonstrated in the following
  320. scheme.
  321. #LA ... ... ...
  322. assigns value to attribute $A
  323. calls #LB
  324. ... ... ...
  325. ##
  326. #LB ... ... ...
  327. $B1 := LAST #LA $A
  328. -- uses inherited attribute $A from #LA
  329. calls #LC
  330. -- after this call the value of attribute $C from #LC is
  331. -- assigned to the synthesized attribute $B2
  332. ... ... ...
  333. ##
  334. #LC ... ... ...
  335. assigns value to attribute $C
  336. LAST #LB $B2 := $C
  337. -- the value is assigned to the synthesized attribute
  338. -- $B2 of #LB
  339. ... ... ...
  340. ##
  341. 10. Simple Telegram Problem.
  342. We shall consider simple telegram problem [5] as compiler
  343. model. A program is required to process a stream of telegrams. The
  344. stream is available as a sequence of words and spaces. The stream
  345. is terminated by an empty telegram.
  346. Each telegram is delimited by symbol "***". Telegrams are to
  347. be processed to determine the number of words with more than
  348. twelve characters for each telegram. Telegrams together with
  349. statistics have to be stored on an output file eliminating all but
  350. one space between the words.
  351. For demonstration purpose the program is divided into two
  352. parts - parsing phase and output code generation phase.
  353. The input stream is represented as a list of tokens, where a
  354. token is an one-character atom. The first phase builds an
  355. intermediate result - abstract syntactic tree.
  356. The structure of input, intermediate and output data can be
  357. described by RIGAL rules.
  358. The input stream.
  359. #telegram_stream
  360. (+ #telegram #end +) [ #blanks ] #end ##
  361. #telegram
  362. (+ #word #blanks +) ##
  363. #word
  364. (+ #letter +) ##
  365. #blanks
  366. (+ ' ' +) ##
  367. #end
  368. '*' '*' '*' ##
  369. #letter
  370. ( A ! B ! C ! ... ! X ! Y ! Z ) ##
  371. The intermediate data.
  372. #S_telegram_stream
  373. (. (+ #S_telegram +) .) ##
  374. #S_telegram
  375. <. text : (. (+ #S_word +) .),
  376. long_word_n: $N .> ##
  377. #S_word
  378. (. (+ #letter +) .) ##
  379. The output stream.
  380. #output_telegram_stream
  381. (+ #telegram1 #end +) #end ##
  382. #telegram1
  383. (+ #word ' ' +) $long_word_num ##
  384. The main program has form:
  385. #telegram_problem
  386. LOAD $T 'Letters.lex';
  387. $S := #telegram_stream_analysis($T);
  388. OPEN Out 'LP:';
  389. #generate_output_stream($S)
  390. ##
  391. The rules are:
  392. #telegram_stream_analysis
  393. (. (+ $R !.:= #A_telegram #end +)
  394. [ #blanks ] #end .) / RETURN $R / ##
  395. #A_telegram
  396. / $long_word_num := 0/
  397. (+ $R !.:= #A_word #blanks +)
  398. / RETURN <. text: $R,
  399. long_word_n: $long_word-num .> / ##
  400. #A_word
  401. (+ $R !.:= #A_letter +)
  402. / IF #Len($R) > 12 ->
  403. LAST #A_telegram $long_word_num + :=1
  404. -- rule #Len was defined in Unit 4.
  405. FI;
  406. RETURN $R / ##
  407. #A_letter
  408. $R := ( A ! B ! C ! ... ! X ! Y ! Z ) / RETURN $R / ##
  409. #generate_output_stream
  410. (. (+ #G_telegram +) .)
  411. / Out << '***'/ ##
  412. #G_telegram
  413. <. text: (. (+ #G_word / Out <] ' ' / +) .),
  414. long_word_n: $N .>
  415. / Out <] $N '***' / ##
  416. #G_word
  417. (. (+ $L / Out <] @ $L / +) .) ##
  418. These rules are obtained from rules, which describe data
  419. structures, by adding operations to the corresponding patterns.
  420. The whole program is written by the recursive descent method.
  421. 11. Implementation
  422. RIGAL is implemented on PDP-11 in RSX-11,and then ported to
  423. VAX-11 in VMS and to IBM PC AT in MS DOS.
  424. First the interpreter of RIGAL was written in Pascal, and
  425. afterwards the optimizing compiler RIGAL --> Pascal was written in
  426. RIGAL itself.
  427. 12. Conclusions and Future Work
  428. As it was demonstrated above, RIGAL supports syntax-oriented
  429. style of compiler design. Programs written in RIGAL are
  430. well-structured and it is easy to read and debug them.
  431. As it was proved by our experience [6], the optimizing RIGAL
  432. compiler in VAX/VMS environment makes it possible to implement
  433. production quality compilers for high level languages.
  434. RIGAL can be considered as yet another language prototyping
  435. tool in the sense of [7], because it allows the designer to
  436. develop an experimental translator in a short period of time.
  437. RIGAL support system besides interpreter for debugging
  438. purposes and optimizing compiler includes a cross-referencer,
  439. which helps to avoid misuse of global variables.
  440. In order to improve static and dynamic type checking,
  441. variable type descriptions in the form of formal comments would be
  442. added to the language.
  443. Taking in account that the control structures of RIGAL
  444. program are very close to input data structures, it seems
  445. promising to develop automatic and semiautomatic methods for test
  446. example generation for the given RIGAL program.
  447. References
  448. [1] Johnson S.C. YACC - yet another compiler compiler. -Bell
  449. laboratories, Murray Hill,N.J., 1978, a technical manual.
  450. [2] Koster C.H.A. Using the CDL compiler compiler.- Lecture
  451. Notes in Computer Science, 1977, vol. 21.
  452. [3] Lucas P. Formal definition of programming languages and
  453. systems. - IFIP congress, 1971.
  454. [4] Auguston M.I. Programming language RIGAL. - Riga, LSU,
  455. 1987, (in Russian ).
  456. [5] Jackson M. Principles of Program design. - Academic
  457. Press , 1975.
  458. [6] Barzdin J.M., Kalnins A.A., Auguston M.I. SDL tools for
  459. rapid prototyping and testing. - in SDL'89 : The
  460. Language at Work, ed. O.Faergemand and M.M.Marques,
  461. North-Holland, 1989, pp.127-133.
  462. [7] Herndon R., Berzins V. The Realizable Benefits of a
  463. Language Prototyping Language. - IEEE Transactions on
  464. Software Engineering, vol.14, No 6, June 1988, pp.
  465. 803-809.
  466.