LANGDESC.TXT 66 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798991001011021031041051061071081091101111121131141151161171181191201211221231241251261271281291301311321331341351361371381391401411421431441451461471481491501511521531541551561571581591601611621631641651661671681691701711721731741751761771781791801811821831841851861871881891901911921931941951961971981992002012022032042052062072082092102112122132142152162172182192202212222232242252262272282292302312322332342352362372382392402412422432442452462472482492502512522532542552562572582592602612622632642652662672682692702712722732742752762772782792802812822832842852862872882892902912922932942952962972982993003013023033043053063073083093103113123133143153163173183193203213223233243253263273283293303313323333343353363373383393403413423433443453463473483493503513523533543553563573583593603613623633643653663673683693703713723733743753763773783793803813823833843853863873883893903913923933943953963973983994004014024034044054064074084094104114124134144154164174184194204214224234244254264274284294304314324334344354364374384394404414424434444454464474484494504514524534544554564574584594604614624634644654664674684694704714724734744754764774784794804814824834844854864874884894904914924934944954964974984995005015025035045055065075085095105115125135145155165175185195205215225235245255265275285295305315325335345355365375385395405415425435445455465475485495505515525535545555565575585595605615625635645655665675685695705715725735745755765775785795805815825835845855865875885895905915925935945955965975985996006016026036046056066076086096106116126136146156166176186196206216226236246256266276286296306316326336346356366376386396406416426436446456466476486496506516526536546556566576586596606616626636646656666676686696706716726736746756766776786796806816826836846856866876886896906916926936946956966976986997007017027037047057067077087097107117127137147157167177187197207217227237247257267277287297307317327337347357367377387397407417427437447457467477487497507517527537547557567577587597607617627637647657667677687697707717727737747757767777787797807817827837847857867877887897907917927937947957967977987998008018028038048058068078088098108118128138148158168178188198208218228238248258268278288298308318328338348358368378388398408418428438448458468478488498508518528538548558568578588598608618628638648658668678688698708718728738748758768778788798808818828838848858868878888898908918928938948958968978988999009019029039049059069079089099109119129139149159169179189199209219229239249259269279289299309319329339349359369379389399409419429439449459469479489499509519529539549559569579589599609619629639649659669679689699709719729739749759769779789799809819829839849859869879889899909919929939949959969979989991000100110021003100410051006100710081009101010111012101310141015101610171018101910201021102210231024102510261027102810291030103110321033103410351036103710381039104010411042104310441045104610471048104910501051105210531054105510561057105810591060106110621063106410651066106710681069107010711072107310741075107610771078107910801081108210831084108510861087108810891090109110921093109410951096109710981099110011011102110311041105110611071108110911101111111211131114111511161117111811191120112111221123112411251126112711281129113011311132113311341135113611371138113911401141114211431144114511461147114811491150115111521153115411551156115711581159116011611162116311641165116611671168116911701171117211731174117511761177117811791180118111821183118411851186118711881189119011911192119311941195119611971198119912001201120212031204120512061207120812091210121112121213121412151216121712181219122012211222122312241225122612271228122912301231123212331234123512361237123812391240124112421243124412451246124712481249125012511252125312541255125612571258125912601261126212631264126512661267126812691270127112721273127412751276127712781279128012811282128312841285128612871288128912901291129212931294129512961297129812991300130113021303130413051306130713081309131013111312131313141315131613171318131913201321132213231324132513261327132813291330133113321333133413351336133713381339134013411342134313441345134613471348134913501351135213531354135513561357135813591360136113621363136413651366136713681369137013711372137313741375137613771378137913801381138213831384
  1. University of Latvia
  2. Institute of Mathematcs and Computer Science
  3. RIGAL PROGRAMMING SYSTEM
  4. ========================
  5. FOR IBM/PC/XT/AT IN MS-DOS
  6. LANGUAGE DESCRIPTION
  7. by M. I. AUGUSTON
  8. TABLE OF CONTENTS
  9. ABSTRACT
  10. 1. Introduction
  11. 2. Implementation
  12. 3. Lexical Rules
  13. 4. Data
  14. 4.1 Atoms
  15. 4.2 Variables
  16. 4.3 Lists
  17. 4.4 Trees
  18. 5. Expressions
  19. 5.1 Accumulative Assignment Statements
  20. 5.2 Semantics of Variables
  21. 6. Rules
  22. 6.1. Simple Patterns
  23. 6.2. Assignment in Patterns
  24. 6.3 Rule Branches. Onfail Operations
  25. 6.4 Special Variables
  26. 7. Compound Patterns
  27. 7.1 List Pattern
  28. 7.2 Iterative Pattern of Sequence
  29. 7.3 Patterns for Alternative and Option
  30. 7.4 Tree Pattern
  31. 7.5 Iterative Tree Pattern
  32. 7.6 Names of Lists and Trees in Patterns
  33. 7.7 Patterns of Logical Condition Testing
  34. 8. Statements
  35. 8.1 Assignment Statement
  36. 8.2 Conditional Statement
  37. 8.3 FAIL Statement
  38. 8.4 Loop Statements
  39. 8.5 Rule Call
  40. 9. Input and Output
  41. 9.1 SAVE and LOAD Statements
  42. 9.2 Text Output
  43. 10. Program Structure
  44. 10.1 Local Variables
  45. 10.2 References to Variables of Other Rules
  46. 10.3 Attribute Grammars and RIGAL. Global Attributes
  47. 11. Built-in Rules
  48. 12. Sample Compiler
  49. 12.1 TOYLAN Language
  50. 12.2 Intermediate Form of Program.
  51. Abstract syntax tree
  52. 12.3 Target Language BAL
  53. 12.4 Main Module of Compiler
  54. 12.5 Parsing Phase
  55. 12.6 Code Generation Phase
  56. 13. Conclusions and Future Work
  57. References
  58. ABSTRACT
  59. A new programming language for compiler writing is described.
  60. The main data structures are atoms, lists and trees. The control
  61. structures are based on advanced pattern matching. All phases of
  62. compilation, including parsing, optimization and code generation,
  63. can be programmed in this language in short and readable form.
  64. Sample compiler written in RIGAL is presented.
  65. 1. Introduction
  66. Programming language RIGAL is intended to be a tool for
  67. parsing( context checking, diagnosing and neutralization of
  68. errors included ), for code optimization, code generation, static
  69. analysis of programs, as well as for the programming of
  70. preprocessors and convertors.
  71. Almost all the systems envisaged to solve compiler
  72. construction problems contain means to describe context-free
  73. grammar of the source language. Earlier systems, like the Floyd -
  74. Evans language, [1] present tools to work with stack, which is
  75. used for parsing. Parsing methods for limited grammar classes are
  76. implemented in the languages and systems of later generations
  77. (usually LL(1) or LR(1) ).
  78. Such systems as YACC (Johnson [2]), CDL-2 ( Koster [3]), SHAG
  79. (Agamirzyan [4]) and many others make use of synchronous
  80. implementation of parsing and different computations, e.g.,
  81. formation of tables, context checking, etc. Usually these actions
  82. are performed by call of semantic subroutines, written in some
  83. universal programming language ( e.g., in Pascal or C).
  84. Attribute grammars advanced by Knuth [5] have greatly
  85. influenced development of systems for compiler construction.
  86. Systems, like, SUPER (Serebryakov [6]), ELMA (Vooglaid, Lepp,
  87. Lijb [7]), MUG2 (Wilhelm [9]) are based on the use of attribute
  88. grammars not only for parsing, but for code generation as well.
  89. Pattern matching is a convenient tool for programming of
  90. parsing, optimization and code generation. The REFAL programming
  91. language [10], acknowledged for translator writing, may serve as
  92. a good example.
  93. Vienna method for defining semantics of programming languages
  94. [11] suggests the usage of labelled trees in order to present the
  95. abstract syntax of programs. Representation of compilation
  96. intermediate results in the tree form has become usual (see
  97. [12]).
  98. Dependence of control structures in the program from data
  99. structures used for program's work is one of the basic principles
  100. in programming. The recursive descent method could be considered
  101. to be the application of dependence principle.
  102. The above mentioned ideas and methods were taken into account
  103. when creating RIGAL language.
  104. The language possesses few basic notions. Data structures
  105. contain atoms, lists and trees. Advanced mechanism of pattern
  106. matching lies at the basis of control structures.
  107. The fact that RIGAL is a closed language makes RIGAL
  108. distinctive. That means that almost all the necessary
  109. computations and input-output could be executed by internal means
  110. and there is no need to use external semantic subroutines.
  111. Therefore the portability of RIGAL programs to other computers is
  112. increased.
  113. Means for work with trees, different patterns including,
  114. enable both programming of parsing algorithms and optimization
  115. phases and code generation as well. The language supports design
  116. of multipass translators. Trees are used as intermediate data.
  117. The language allows to split the program into small modules
  118. (rules) and presents various means to arrange interaction of
  119. these modules. Pattern matching is used for parameter passing.
  120. RIGAL supports attribute translation scheme and easy
  121. implementation of synthesized and inherited attributes is
  122. possible. The problem of global attributes is solved by usage of
  123. special references.
  124. Lexical analysis is a separate task and requires special
  125. language facilities for description as it is, for example, in
  126. LEX/YACC [2] system. In the current implementation of RIGAL two
  127. scanners are included that accept lexics of Pascal and RIGAL.
  128. 2. Implementation
  129. RIGAL was designed and implemented in the Computing Center of
  130. Latvia University in years 1987-1988. The first implementation
  131. was for PDP-11 in RSX-11.
  132. At the present stage RIGAL interpreter has been developed and
  133. optimizing compiler RIGAL -> Pascal has been implemented by means
  134. of RIGAL itself. The interpreter and the compiler have been
  135. ported to VAX/VMS and IBM PC XT/AT (MS-DOS) environments.
  136. 3. Lexical Rules
  137. The text of RIGAL program is a sequence of tokens - atoms
  138. (e.g., identifiers and integers ), keywords (e.g., if, return ),
  139. special symbols (e.g., +, ## ), names of variables and rules
  140. (e.g., $A, #L ). Tokens may be surrounded by any number of
  141. blanks. A comment is any string of symbols that begins with two
  142. consecutive symbols '-' (minus ). The end of the comment is the
  143. end of the line. For example,
  144. #Sum -- rule for addition of two numbers
  145. $N1 -- the first number
  146. $N2 -- the second number
  147. / return $N1 + $N2 / -- return of the result
  148. ##
  149. 4. Data
  150. 4.1 Atoms
  151. An atom is a string of symbols. If the atom is an identifier
  152. ( the first symbol is a letter followed by letters or digits or
  153. underscore symbols), in the text of RIGAL program it could be
  154. written directly: AABC total_number x25
  155. Numerical atoms are integers, for instance, 2, 187, 0, -25
  156. In other cases the atom is quoted: '+' ':=' '1st'
  157. Some identifiers are reserved as keywords in RIGAL. If they
  158. are used as RIGAL atoms, they should be quoted. For example,
  159. 'if', 'return'. Besides, any atom, which is an identifier, also
  160. can be quoted - ABC and 'ABC' represent one and the same atom.
  161. It should be noted that 25 and '25' are different atoms, the
  162. latter is just a string of symbols '2' and '5'.
  163. Two special atoms are distinguished in the language.
  164. NULL - this atom is frequently yielded as a result of
  165. different operations, if something was incorrect in the process
  166. of computations. This atom also represents an empty list, an
  167. empty tree and Boolean value "false".
  168. T - usually this atom is yielded by logical operations and
  169. represents value "true".
  170. 4.2 Variables
  171. The name of a variable must begin with the symbol $, followed
  172. by an identifier. Value can be assigned to a variable, for
  173. example, by the help of assignment statement: $E := A
  174. In this case the atom A becomes value of the variable $E.
  175. In RIGAL variables have no types, the same variable may have
  176. an atom, a list or a tree as a value in different time moments.
  177. 4.3 Lists
  178. Ordered sequences, i.e., lists can be composed from atoms and
  179. from other lists and trees, as well. A special function - list
  180. constructor serves for list formation. For instance, (. A B C .)
  181. forms a list of three atoms A, B and C.
  182. Arguments of the list constructor may be expressions. The
  183. sample $E := (. (. 8 14 7 .) (. A B .) .) could be rewritten as
  184. follows:
  185. $A := (. 8 14 7 .); $B := (. A B .); $E := (. $A $B .);
  186. Separate elements of the list can be selected by indexing.
  187. Hence, $B [1] is atom A, $A [2] is atom 14, $E [2] is list
  188. (. A B .), but $E [10] is atom NULL
  189. If the value of the index is a negative number, for instance
  190. -N, then the N-th element, beginning from the end of the list, is
  191. selected. For example, $A [-1] is atom 7.
  192. The necessity to add one more element to the list is quite
  193. common. Operation !. is envisaged for this purpose.
  194. Example. (. A B .) !. C yields the list (. A B C .)
  195. To link two lists in a new list the operation !! is applied (
  196. list concatenation). For instance, (. A B .) !! (. C D .) yields
  197. (. A B C D .).
  198. 4.4 Trees
  199. Tree constructor is used to create a tree. For example,
  200. <. A : B, C : D .>
  201. One can imagine tree as a graph, the nodes and arches of
  202. which are marked by some objects.
  203. Objects before ':' in the tree constructor are named
  204. selectors. In the given implementation solely atoms, which are
  205. identifiers ( except NULL ), may serve as selectors. In the
  206. graphical representation selectors correspond to arches of the
  207. graph. All selectors of one and the same level in the tree must
  208. be different.
  209. Any object - atom, list or tree, except atom NULL, may
  210. correspond to terminal nodes of the graph ( "leaves" of the
  211. tree). Hence, multilayer trees can be built. For instance,
  212. <. A : B, C : <. D : E, F : G .> .>
  213. Pair "selector : object" in the tree is named branch of the
  214. tree. Branches are unordered in the tree.
  215. Likewise for the list constructor, the tree constructor may
  216. be described by expressions ( in both selector and object
  217. places), for instance, $X := D; $B := (. 2 8 .); $C := <. A : <.
  218. M : K .>, $X : '+' , E : $B .>;
  219. Select operation serves to extract the tree component. It is
  220. in the following form: tree . sel , where tree is the expression,
  221. whose value must be some tree, but sel is the expression, whose
  222. value must be an atom-identifier.
  223. Consequently, $C . A is the tree <. M : K .> , $C . D is the
  224. atom '+' , $C . E is the list (. 2 8 .) , $C . E[2] is the atom 8
  225. , $C . A . M is the atom K
  226. If there is no branch with a given selector in the tree, then
  227. the result is NULL: $C . W is atom NULL.
  228. Operation of the tree "addition" is performed as well: T1 ++
  229. T2 , where T1 and T2 are trees. Tree T2 branches are added to the
  230. tree T1 one by one. If in the tree T1 there already exists a
  231. branch with the same selector, the branch is substituted by a new
  232. one. Therefore, the operation "++" is not commutative.
  233. It should be pointed out that the tree constructor is
  234. computed from left to right, i.e., <. s1 : a1, s2 : a2, s3 : a3
  235. .> gives the same result as the expression
  236. (( NULL ++ <. s1 : a1 .> ) ++ <. s2 : a2 .>) ++ <. s3 : a3 .>
  237. 5. Expressions
  238. Operations = and <> serve for the comparison of objects. The
  239. result of the comparison is either T ("true") or NULL ("false").
  240. Atoms are matched directly, for instance, a = b gives NULL,
  241. 25 = 25 gives T, 17 <> 25 gives T.
  242. Lists are considered equal iff they contain equal number of
  243. components and if these components are equal respectively.
  244. Trees are considered equal iff they contain equal number of
  245. branches and if one of the trees contains the branch "S : OB",
  246. then the other tree also contains the branch "S : OB1" and OB =
  247. OB1.
  248. Arithmetical operations +, -, *, DIV, MOD are assigned for
  249. numerical atoms. The essence of these operations is similar to
  250. those in Pascal. The result of an arithmetical operation is a
  251. numerical atom. Atom NULL is also admitted as the argument of
  252. arithmetical operation, in this case integer 0 is supposed to be
  253. its value. Under matching these atoms are considered different,
  254. i.e., NULL = 0 gives NULL.
  255. Besides the operations = and <> numerical values could be
  256. compared by the help of >, <, >= and <=.
  257. Logical operations AND, OR and NOT usually are applied under
  258. conditions in conditional statements. Their arguments may be
  259. arbitrary objects. If the object differs from NULL, it is
  260. supposed to have the value "true" in a logical operation. Atom
  261. NULL represents the "false" value. The result of a logical
  262. operation always is either T or NULL.
  263. In order to make complex hierarchic objects ( trees and lists
  264. of complex structure ) more visual and to improve the work of
  265. pattern matching, trees and lists may be labelled. A name is an
  266. atom opposed to the root node of the tree or the whole list.
  267. Labelling operation is written the following way:
  268. A :: OB where A is an atom , OB is
  269. an object (a tree or a list). For example, Add_op :: <. arg1 : 5,
  270. arg2 : 4 .>
  271. The execution order of operations in the expression is
  272. controlled by parentheses "(" and ")". Priority is assigned to
  273. every operation, and, if the execution order is not defined by
  274. parentheses, operations are executed according to their
  275. priorities - beginning with the higher and proceeding to the
  276. lower.
  277. Operations are listed in decreasing order of priorities (some
  278. operations will be discussed later).
  279. 1) Rule call, list constructor, tree constructor, last
  280. 2) Selector ".", index "[]", ::
  281. 3) NOT, unary -
  282. 4) *, DIV, MOD
  283. 5) !. , !! , ++ , +, binary -
  284. 6) = , <> , > , < , >= , <=
  285. 7) AND
  286. 8) OR
  287. Binary operations of the same priority are executed from left
  288. to right, while unary operations from right to left.
  289. 5.1 Accumulative Assignment Statements
  290. It is quite often that working with a list or a tree elements
  291. are added step by step, thus the growing object is retained in
  292. the same variable. Therefore short form of assignment statement
  293. has been introduced. For the operations !. , !! , ++ and +
  294. statements of the form $X := $X op Expr can be written as
  295. $X op := Expr .
  296. 5.2 Semantics of Variables
  297. In the implementation of RIGAL every object - atom, list or
  298. tree has a descriptor. It is a special data structure that
  299. contains some information about this object: the value of atom,
  300. the number of elements and pointers to elements for lists and
  301. trees. Variables have pointers to object descriptors as values.
  302. Statement $X := OB assigns to the variable $X a pointer to
  303. the descriptor of object OB. After the execution of the statement
  304. $Y := $X both variables $X and $Y contain pointers to the same
  305. object OB.
  306. Operations !.:= , !!:= , +:= and ++:= change the descriptor
  307. of their first argument, i.e., have a side effect. Sometimes it
  308. can be undesirable. For example,
  309. $A := (. 3 17 .); $B := $A; $B !.:= 25;
  310. The value of $B becomes list (. 3 17 25 .), but operation
  311. !.:= has added element 25 immediately to the list (. 3 17 .), and
  312. the descriptor of this list is changed. As the value of the
  313. variable $A was a pointer to this descriptor, then after the
  314. execution of the statement $B !.:= 25 the value of the variable
  315. $A is changed, too.
  316. To prevent this, we must have a copy of the object, to which
  317. $A refers to before assigning it to $B. Built-in rule #COPY( OB )
  318. is used for this purpose. It makes a copy of the descriptor of
  319. the object OB.
  320. Now we can write a "safe" call of the operation !.:= in such
  321. a way: $A := (. 3 17 .); $B := #COPY( $A); $B !.:= 25;
  322. As a result $B takes the value (. 3 17 25 .), but $A remains
  323. equal to (. 3 17 .). The same effect can be obtained after
  324. execution of statements $A := (. 3 17 .); $B := $A !. 25;
  325. 6. Rules
  326. The concept of rule is analogous to concepts of procedure and
  327. function in conventional languages, such as Pascal or C.
  328. First of all, by the help of a rule we can check, whether an
  329. object or a sequence of objects complies with some grammar. For
  330. this purpose rule has a pattern. Objects, assigned under rule
  331. call (arguments of the rule), are matched with the pattern. If
  332. there is a necessity, some operations could be executed, for
  333. instance, computations and input-output operations,
  334. simultaneously with rule arguments and pattern matching.
  335. Rule that is called can compute and return back some value
  336. (object),i.e., it can be used as function.
  337. Depending on the result of rule arguments and pattern
  338. matching, the rule call ends with success or failure. Thus the
  339. rule call could be used in another rule patterns.
  340. 6.1. Simple Patterns
  341. Definition of the rule begins with the indication of the name
  342. of the rule in the form of #LLL , where LLL is an
  343. atom-identifier. In the most common case the pattern is indicated
  344. after the name of the rule. Rule definition ends with symbol
  345. '##'. For instance,
  346. #L1 A B ##
  347. In this case the pattern consists of atoms A and B. Call of
  348. the rule #L1 takes place, for instance, the following way :
  349. #L1 ( A B )
  350. The sequence of objects - atoms A and B, is assigned as rule
  351. arguments.
  352. After rule call the argument and pattern matching begins. The
  353. first argument - atom A is matched with the first pattern, which
  354. is also an atom. These atoms are equal, so their matching is
  355. successful. After that the next object from the sequence of
  356. arguments - atom B and the next pattern, also atom B, are
  357. matched. Their matching is also successful, therefore, the call
  358. of the rule #L1 is successful, too.
  359. #L1( A C ) call fails, because the second argument - atom C
  360. was not successfully matched with the pattern - atom B.
  361. #L1( A ) call fails, as there is no object for the second
  362. pattern with which it could be matched successfully.
  363. But #L1( A B C) call is successful, as the third rule
  364. argument - atom C was not demanded by any pattern and matching of
  365. the first two rule arguments was successful.
  366. Arbitrary atoms, both non-numerical and numerical, may be
  367. described as patterns.
  368. Such operations as assignment statements, input - output and
  369. others may be indicated before the pattern, after it and between
  370. patterns. These statements are quoted in the pair of symbols '/'
  371. . If there is a necessity to write several statements within the
  372. pair '/' , these statements are separated by the symbol ';' .
  373. A group of statements is executed in the rule pattern, if
  374. matching of the previous pattern with the corresponding rule
  375. arguments was successful.
  376. The value returned by the rule is worked out by statement
  377. return . It has the following form: return expression.
  378. Simultaneously this statement completes the execution of the
  379. rule.
  380. Example.
  381. #L2 'begin' 'end' / return 'The pair begin-end' / ##
  382. Rule call is illustrated in the following example.
  383. $A := #L2 ( 'begin' 'end' ); As a result the atom
  384. 'Thetpairtbegin-end' is assigned to the variable $A.
  385. If return statement is not described in the rule, then the
  386. NULL value is returned.
  387. If the rule call ends in failure, then usually value NULL is
  388. returned, although in case of failure, it is possible to work out
  389. the returned value, which is not NULL; for this purpose statement
  390. return must be used in onfail-operations (see sect.6.3.).
  391. Variable could be used as pattern. It is matched with one
  392. object (atom, list or tree) from the sequence of rule arguments.
  393. Matching always ends in success, and as a side effect the
  394. variable obtains this object as value. For example, #L3 $A $B /
  395. return (. $B $A .)/ ##
  396. After the call $X := #L3 ( 1 2) the variable $X obtains as
  397. value (. 2 1 .)
  398. The rule pattern, in its turn, may refer to some rule
  399. (recursively, as well). Then the subsequence of the calling rule
  400. arguments become arguments of rule - pattern. If the call of the
  401. rule - pattern is successful, then matching of further patterns
  402. of the calling rule with the remaining arguments proceeds. If the
  403. call of the rule - pattern fails, then the pattern matching of
  404. the calling rule fails, too. For example,
  405. #L4 A #L5 D ##
  406. #L5 B C ##
  407. Then the call #L4( A B C D) is successful, but the call
  408. #L4( A E F D) is unsuccessful.
  409. There is a number of built-in rules in the language (see
  410. section 11.). For instance, #NUMBER is successfully matched with
  411. a numerical atom and returns it as a value, other arguments fail
  412. and return NULL. #IDENT is successfully matched with atom -
  413. identifier and returns it as value.
  414. 6.2. Assignment in Patterns
  415. Every pattern, when successfully matched with the
  416. corresponding rule argument, returns some value. The value of
  417. atom pattern coincides with this atom, the value of variable
  418. pattern coincides with the value obtained by this variable as a
  419. result of matching with the arguments. The value of rule pattern
  420. is defined by statement return in this rule.
  421. If the matching ends in failure, the pattern usually returns
  422. value NULL.
  423. These values, returned by patterns, can be assigned at once
  424. to some variable. It is enough to write the name of the variable
  425. and the assignment symbol ':=' before the pattern element.
  426. Example.
  427. #L6 $A $R := #L7 / return (. $A .) !! $R / ##
  428. #L7 $B $C / return (. $B $C .) / ##
  429. After execution of the statement $X := #L6 ( 1 2 3) the value
  430. of $X will be (. 1 2 3 .)
  431. Symbols of accumulative assignment '!.:=' , '!!:=' , '++:='
  432. and '+:=' can be used instead of ':=' in patterns .
  433. Therefore, we can rewrite the previous example the following
  434. way:
  435. #L6_1 $R !.:= $A $R !!:= #L7_1 / return $R / ##
  436. #L7_1 $M !.:= $B $M !.:= $C / return $M / ##
  437. It should be noted that all variables in the rule are
  438. initialized by value NULL, so the value of the expression NULL !.
  439. $A that equals to (. $A .) is assigned to the variable $R by the
  440. first application of the pattern $R !.:= $A in #L6_1.
  441. Patterns of the type $N := #NUMBER or $ID := #IDENT are used
  442. very often, therefore, following defaults are introduced in the
  443. language. If the first letter of the variable name is N, then
  444. this variable, having been used as pattern element, will have a
  445. successful matching only with a numerical atom, in other cases
  446. matching ends in failure, and variable obtains value NULL. If the
  447. first letter of the name of the variable is I, this variable is
  448. matched successfully only with an atom-identifier.
  449. 6.3 Rule Branches. Onfail Operations
  450. Several groups of patterns may be united in one rule.
  451. The first group of patterns is applied to the rule arguments
  452. first. If matching of this group of patterns with the arguments
  453. is successful, the rule call is successful. But if matching has
  454. failed, transition to the next group of patterns takes place, and
  455. it is matched with the same arguments. It goes on until some
  456. group of rule patterns is matched with the arguments
  457. successfully. If not a single pattern group matches successfully
  458. with rule arguments, the rule call ends in a failure.
  459. Such alternative pattern groups are called rule branches,
  460. and, when writing the rule, they are separated by the symbol
  461. ';;'.
  462. If the branch fails, the execution of its patterns and
  463. statements is abandoned at the place, where branch pattern
  464. failed, and control is transferred to the next branch (if there
  465. is one) or the whole rule fails (if the current branch is the
  466. last one in the rule). Still there is a possibility to execute
  467. some operations before exit from the branch.
  468. ONFAIL operation is a sequence of statements, written at the
  469. end of the branch and delimited from patterns and branch
  470. statements by keyword ONFAIL.
  471. If ONFAIL-statements are described in the branch, then in
  472. case of branch failure, control is transferred to them and
  473. statements, given in ONFAIL-unit , are executed. So, in case of
  474. branch failure, causes of failure can be analyzed and message can
  475. be output. Statement return can be executed in ONFAIL operations,
  476. as well. Then exit from the rule takes place (with failure), and
  477. some other value than NULL can be returned.
  478. 6.4 Special Variables
  479. Special variable without name $ denotes the first rule
  480. argument matched by a rule pattern.
  481. The value of special variable $$ equals to the value of
  482. current rule argument, to which current rule pattern is applied.
  483. 7. Compound Patterns
  484. Lists, sequences of elements in lists and trees can be
  485. analyzed by patterns. Nesting of patterns practically is
  486. unlimited. It is allowed to insert statements before, after and
  487. within any pattern.
  488. 7.1 List Pattern
  489. List pattern is written in the rule the following way:
  490. (. S1 S2 ... SN .) where S1, S2, ..., SN are
  491. patterns . For instance,
  492. #L8 (. $E1 $E2 .) ##
  493. Pattern of the rule #L8 is matched successfully with any
  494. list, containing precisely two elements. Such call is successful:
  495. #L8( (. (. 1 2 .) <. A : B .> .) )
  496. But the following calls end in failure. #L8( A B ) - because
  497. pattern will be applied to the first argument, i.e., to atom A.
  498. #L8( (. 13 .) ) - because the argument is one element list.
  499. In case of success the list pattern yields value that can be
  500. assigned to some variable. This value coincides with the whole
  501. list, to which the pattern was applied.
  502. 7.2 Iterative Pattern of Sequence
  503. In RIGAL the following pattern is defined for sequence
  504. recognition: (* S1 S2 ... SN*) , where S1 , S2, ... SN are some
  505. patterns. This pattern describes the repetition of enclosed
  506. sequence of patterns zero or several times.
  507. Rules with a variable number of arguments can be defined by
  508. iterative pattern of sequence. For example,
  509. #Sum (* $S +:= $N *) / RETURN $S / ##
  510. This rule is used for summing up any amount of numbers.
  511. #Sum( 2 5 11) = 18 and #Sum( 3 ) = 3
  512. Iterative pattern is very often used within list pattern. For
  513. instance, the following rule counts the number of list elements.
  514. #Length (. / $L := 0 / (* $E / $L +:= 1 / *) .) / RETURN $L / ##
  515. Samples of rule call.
  516. #Length( (. A B C .) ) = 3 and #Length( NULL ) = 0
  517. Iterative pattern (+ S1 S2 ... SN +) is analogous to the
  518. pattern (* ... *), but assigns the repetition of enclosed pattern
  519. sequence one or several times.
  520. In the iterative pattern of sequence the element delimiter is
  521. indicated in the form of (* S1 S2 ... SN * Delimiter ) or (+ S1
  522. S2 ... SN + Delimiter +)
  523. Atom or name of the rule may serve as Delimiter .
  524. Example. Analysis of a simple Algol-like declaration. A
  525. fragment of variable table coded in a tree form is returned as a
  526. result. #Declaration $Type := ( integer ! real )
  527. (+ $Id / $Rez ++:= <. $Id : $Type .> / + ',')
  528. / RETURN $Rez / ##
  529. Call #Declaration ( real X ',' Y ) returns value
  530. <. X : real, Y : real .>
  531. It should be noted that the pattern (* $E * ',' ) differs
  532. from the pattern (* $E ',' *) in the point that the presence of
  533. atom ',' is obligatory in the second pattern at the end of
  534. sequence.
  535. 7.3 Patterns for Alternative and Option
  536. The choice of several possible patterns is written the
  537. following way: ( S1 ! S2 ! ... ! SN )
  538. Patterns S1 , S2 , ... , SN are applied one by one from left
  539. to right, until one of them succeeds.
  540. In case of success alternative pattern yields value. It
  541. coincides with the value of the successful pattern within the
  542. alternative and may be assigned by some variable.
  543. Example. Simple arithmetic expression parsing. When
  544. successful, an expression tree is returned, which can be regarded
  545. as an intermediate form for the next compilation phases.
  546. #Expression $A1 := #Term
  547. (* $Op := ( '+' ! '-' ) $A2 := #Term
  548. / $A1 := <. op : $Op , arg1 : $A1 , arg2 : $A2 .> / *)
  549. / RETURN $A1 / ##
  550. #Term $A := ( $Id ! $Num ) / RETURN $A / ;;
  551. '(' $A := #Expression ')' / RETURN $A / ##
  552. The call #Expression( X '-' Y '+' 7 ) returns the value
  553. <. op: '+', arg1: <. op: '-', arg1: X, arg2: Y .>, arg2: 7 .>
  554. In RIGAL we may write a rule that matches successfully with
  555. an empty sequence of arguments: #empty ##
  556. Now the pattern for option can be written down: ( S ! #empty )
  557. In short form this issue may be written down in RIGAL the
  558. following way: [ S ] where S is some pattern or pattern sequence.
  559. Pattern [ S ] always ends in success.
  560. 7.4 Tree Pattern
  561. Tree pattern checks, whether the object is a tree with fixed
  562. structure. By means of this pattern access to the components of
  563. the tree is obtained. The tree pattern is described the following
  564. way:
  565. <. Sel1 : Pat1, Sel2 : Pat2, ... , SelN : PatN .> where
  566. Sel1, Sel2, ... SelN are atoms-identifiers, but Pat1, Pat2, ...
  567. PatN are patterns.
  568. If the object, to which the tree pattern is applied, is not a
  569. tree, then the application of the pattern fails at once. If there
  570. is the selector Sel1 in the tree, then the pattern Pat1 is
  571. applied to the corresponding object. If there is no selector Sel1
  572. in the tree or the application of Pat1 has failed, the whole
  573. pattern also fails.
  574. If matching the first branch was successful, branch matching
  575. of the pattern 'Sel2 : Pat2' , etc. begins.
  576. Hence, pattern branches are applied to the tree in the same
  577. order as they are written in the pattern. Therefore, the order of
  578. tree traversing may be controlled. It is possible to have
  579. reiterative visit of branches ( if selectors are repeatedly
  580. described in the tree pattern) or omission of branches (if
  581. corresponding selectors are not given in the pattern).
  582. In case of success the tree pattern returns the value, which
  583. coincides with the whole object - a tree, to which the pattern
  584. was applied, irrespective of presence of all tree selectors in
  585. the pattern or absence of some.
  586. Example. Let us suppose expression tree to be formed like in
  587. the above example. The task is to traverse the tree and return a
  588. list that represents the Polish postfix form of this expression.
  589. #Postfix_form <. arg1: $Rez := #Postfix_form,
  590. arg2: $Rez !!:= #Postfix_form,
  591. op: $Rez !.:= $Op .> / RETURN $Rez / ;;
  592. $Rez := ( $Id ! $Num ) / RETURN (. $Rez .) / ##
  593. The call #Postfix_form( <. op: '-', arg1: X, arg2: <. op:
  594. '+', arg1: Y, arg2: 5 .> .>) returns the value (. X Y 5 '+' '-'
  595. .)
  596. Some branches in the tree pattern may be described as
  597. optional, in this case they are enclosed in brackets '[' and ']'
  598. . If there is no selector of optional branch in the argument
  599. tree, its pattern is omitted and transition to next pattern
  600. branch takes place. If there is a selector of the type in the
  601. argument tree, the pattern branch is developed as usual.
  602. 7.5 Iterative Tree Pattern
  603. The simplest form of iterative tree pattern is the following:
  604. <* $Var : P *> where $Var is some
  605. variable, and P is a pattern.
  606. A loop over the tree is performed by the help of this
  607. pattern. All selectors of the argument tree are assigned to the
  608. variable $Var one by one. The pattern P is applied to each
  609. object, which corresponds in the argument tree to the current
  610. selector in the variable $Var. If even one application of the
  611. pattern P fails, the whole iterative tree pattern fails. For
  612. example,
  613. #Variable_table <* $Id : $E := ( integer ! real )
  614. / $R !.:= (. $Id $E .)/ *> / RETURN $R / ##
  615. Call example.
  616. #Variable_table( <. X : integer, Y : real, Z : real .> ) =
  617. (. (. X integer .) (. Y real .) (. Z real .) .)
  618. Sometimes, performing a loop over the tree, some branches
  619. should be updated in a special way. For this purpose iterative
  620. tree pattern with distinguished branches is used.
  621. <* Sel1 : Pat1, Sel2 : Pat2, ..., SelN : PatN, $Var : P *>
  622. where Sel1, Sel2, ... SelN are atoms-identifiers; Pat1, Pat2, ...
  623. PatN are patterns, i.e., like elements in simple tree pattern;
  624. $Var is a variable, but P is a pattern, as in simple case of
  625. iterative tree pattern.
  626. The pattern is applied to the argument tree the following
  627. way. First of all distinguished pattern branches 'Sel : Pat' are
  628. developed. Their matching with branches of the argument tree
  629. happens exactly the same way as with simple tree pattern. Then
  630. the element '$Var : P' is applied to other branches of the
  631. argument tree the same way as in simple iterative tree pattern.
  632. Some distinguished branches can be optional, for this purpose
  633. they are enclosed in brackets '[' and ']'. Semantics is the same
  634. as in the case of simple tree pattern.
  635. Example. Let it be a tree of arbitrary size. In some of its
  636. subtrees there is the selector LABEL, to which numerical atom is
  637. attached. All these numbers over the whole tree must be collected
  638. in a list, and the list must be returned as result. #Label_list
  639. <* [ LABEL : $Result !.:= $N ],
  640. $S : $Result !!:= #Label_list *> / RETURN $Result / ;;
  641. $E ##
  642. The rule has two branches. Traversing of the tree and its
  643. subtrees is described in the first branch. The resulting list is
  644. formed in the variable $Result. The traversing of subtrees is
  645. carried out by the help of recursive call of the rule
  646. #Label_list. The second branch of the rule consisting of just one
  647. pattern $E is applied at the leaves of the tree. The pattern
  648. matches successfully with any object and the whole branch returns
  649. value NULL, which is accepted as empty list at the previous level
  650. of recursion.
  651. Call sample. #Label_list ( <. A: <. LABEL: 5, B: abc .>,
  652. LABEL: 17, D: 25 .> ) = (. 17 5 .)
  653. 7.6 Names of Lists and Trees in Patterns
  654. The assignment of names was discussed in Section 5.
  655. In list and tree patterns we can have matching of list and
  656. tree names with the described values or simply get these names in
  657. the described variable. For this purpose the name may be
  658. indicated before list or tree pattern: atom :: pattern
  659. If the atom described in the pattern coincides with the name
  660. of the list (or tree), to which the pattern is applied, the
  661. application of the pattern to the argument begins. If it does
  662. not, other pattern elements are not applied and the pattern
  663. fails.
  664. To obtain access to the name of the argument, instead of atom
  665. we must indicate the variable, in which the name is assigned as
  666. value.
  667. 7.7 Patterns of Logical Condition Testing
  668. Pattern of the type: S' ( expression ) works the following
  669. way. First of all the expression is evaluated. If its value
  670. differs from NULL, the pattern is successful. The value of the
  671. matched argument is returned as the value of the pattern, if
  672. matching was successful. If the value of the expression equals to
  673. NULL, the pattern fails and returns NULL.
  674. The value of special variable $$ in the expression of
  675. S-pattern equals to the value of the argument, to which S-pattern
  676. is applied.
  677. The skip of token sequence until the nearest symbol ';' is
  678. described by the pattern: (* S' ( $$ <> ';' ) *)
  679. Let under parsing a case is accentuated when the assignment
  680. statement is in the form of X := X + E , where X is a variable,
  681. and E is an expression. This case could be described by pattern
  682. of the type: $Id ':=' S' ( $$ = $Id ) '+' #Expression
  683. Pattern of the type: V' ( expression ) works similar to
  684. S-pattern, yet in case of success no advancing along the sequence
  685. of rule arguments takes place, so the next pattern element is
  686. applied to the same argument.
  687. This pattern is useful for context condition check. Example.
  688. The pattern S' ( #NUMBER($$) and ( $$ > 7)) may be substituted by
  689. a sequence of patterns $Num V'($Num > 7)
  690. 8. Statements
  691. 8.1 Assignment Statement
  692. In the left side of assignment statement a variable may be
  693. indicated, which is followed by an arbitrary number of list
  694. indexes and/or tree selectors. For example, $X := (. A B C .); $Y
  695. := <. D : E, F : G .>;
  696. After assignment $X[2] := T the value of $X is (. A T C .) .
  697. After assignment $Y.D :=17 the value of $Y is <.D :17, F:G .>
  698. The execution of the statement $Y.A := T yields the run time
  699. error message. The necessary result is obtained the following
  700. way:
  701. $Y ++:= <. A : T .>;
  702. Tree branch is deleted by assigning an empty object to the
  703. corresponding selector: $Y.D := NULL;
  704. 8.2 Conditional Statement
  705. Conditional statement has the following form:
  706. IF expression -> statements Then branches may
  707. follow (it is not compulsory)
  708. ELSIF expression -> statements Conditional
  709. statement ends with keyword FI.
  710. In conditional statement branches expressions are computed
  711. one by one, until a value different from NULL is obtained. Then
  712. the statements described in this branch are executed.
  713. 8.3 FAIL Statement
  714. FAIL statement finishes the execution of the rule branch with
  715. failure. Example. In order to repair errors in parsing process,
  716. the sequence of tokens should be skipped quite frequently, for
  717. instance, until semicolon symbol. It is done the following way.
  718. #statement ... ;; -- branches for statement analysis
  719. (* #Not_semicolon *) ';' -- no statement is recognised
  720. ##
  721. #Not_semicolon $E / IF $E = ';' -> FAIL FI/ ##
  722. 8.4 Loop Statements
  723. Statement of the types
  724. FORALL $VAR IN expression
  725. DO statements OD
  726. FORALL SELECTORS $VAR BRANCHES $VAR1 IN expression
  727. DO statements OD
  728. FORALL BRANCHES $VAR1 IN expression
  729. DO statements OD loops over a list or a tree.
  730. The value of the expression must be either a list or a tree.
  731. Value of the current list element (if the loop is over the list)
  732. or value of the current selector (if the loop is over the tree)
  733. is assigned to the loop variable $VAR one by one. The
  734. corresponding branch value is assigned to the variable $VAR1 (if
  735. the loop is over the tree). Statements, describing body of the
  736. loop, may use the current value of the variables $VAR and $VAR1.
  737. LOOP statement of the type
  738. LOOP statements END; repeats statements of
  739. the loop body, until one of the statements - break, RETURN or
  740. FAIL is not executed.
  741. 8.5 Rule Call
  742. If a rule is called just to execute statements described in
  743. it, and value returned by the rule is not necessary, the rule
  744. call is written down as statement. It is analogous to procedure
  745. call in traditional programming languages. Success/failure of the
  746. rule and value returned by it is disregarded in such a call.
  747. 9. Input and Output
  748. 9.1 SAVE and LOAD Statements
  749. Objects created by RIGAL program (atoms, lists, trees) can be
  750. saved in the file and loaded back to the memory.
  751. Statement SAVE $Var file-specification unloads the object,
  752. which is the value of the variable $Var to the file with the
  753. given specification. File, formed by SAVE statement, contains
  754. precisely one object (atom, list or tree).
  755. We can load the object from the file in the memory having
  756. executed statement: LOAD $Var file-specification
  757. 9.2 Text Output
  758. To output texts (messages, generated object codes, etc. )
  759. several text files can be opened in the RIGAL program. The text
  760. file FFF is opened by statement: OPEN FFF file-specification
  761. File-specification may be an expression. It presents the name of
  762. the file on the device.
  763. Statement of the type
  764. FFF << Expr1 Expr2 ... ExprN outputs a sequence
  765. of atoms to the file FFF. Values of expressions Expr1, Expr2, ...
  766. are either atoms or lists consisting of objects, different from
  767. trees.
  768. This statement outputs atoms as sequences of symbols to the
  769. text file, inserting a blank after every atom. Atoms in the list
  770. are output in the same order as they are in the list.
  771. Example. FFF << A B 12 ;
  772. A string of characters is output in the text file FFF the
  773. following way: "A B 12 "
  774. Symbol @ in the sequence of expressions of output statement
  775. switches over to other output mode of blanks separating atoms. By
  776. default at the beginning of execution of text output statement
  777. the output mode with blanks is on.
  778. Example. FFF << A B @ C D 25 @ E F 57 ;
  779. The following string of characters is output to the text file
  780. "A B CD25E F 57"
  781. Statement of the type FFF << ... always begins output with
  782. the beginning of a new record. Output statement of the type FFF
  783. <] ... continues output in the current record.
  784. By the help of the statement of the type
  785. PRINT expression the value of atom, list
  786. or tree can be output in a readable form on the display, disk or
  787. on the printer. It is useful when debugging RIGAL programs.
  788. 10. Program Structure
  789. Program written in the RIGAL language consists of the main
  790. program and rules. The main program text must be at the beginning
  791. of the RIGAL program text, but the text of the rules is written
  792. afterwards. Main program, as well as rule, begins by the name
  793. indication in the form of #main_program_name, then statements are
  794. described, separated by symbols ';' . The end of the main program
  795. is marked by the symbol '##'.
  796. Usually operations that deal with the initial object loading,
  797. rule call and unloading of created objects in the files are
  798. concentrated in the main program. Therefore, main program has no
  799. pattern elements or arguments of its own.
  800. When RIGAL is used for parsing, text file with the input
  801. information first is updated by scanner, which transforms it into
  802. a list of tokens. Thus rules describing required parsing can be
  803. applied to this list. Intermediate results of the RIGAL program,
  804. for instance, abstract syntax trees obtained by parsing can be
  805. unloaded in the file by the help of SAVE statement. They are
  806. unloaded so, that other RIGAL programs (for instance, those,
  807. which implement phase of code generation in compilers) can load
  808. them as input data. Sample of main program is given in Section
  809. 12.4.
  810. Rules can be written down after main program in any order.
  811. 10.1 Local Variables
  812. There are no special variable declarations in RIGAL. The fact
  813. that a variable is used in some statement, pattern or expression
  814. implies that the variable is defined as local variable of the
  815. rule or the main program.
  816. For recursive rule calls local rule variables are pushed in a
  817. stack, so, that every active rule instance has its own set of
  818. local variables.
  819. All variables are initialized by the NULL value, when the
  820. corresponding rule or the main program is called and when the
  821. execution of rule branch starts.
  822. 10.2 References to Variables of Other Rules
  823. The construction of RIGAL makes it possible to obtain access
  824. from the rule to local variables of another rule. It has the
  825. following form: LAST #L $X
  826. This reference denotes the value of the variable $X in the
  827. last (in time ) and still active instance of the rule #L. Such
  828. references can be used in left and right sides of assignment
  829. statement.
  830. If at the moment of evaluation of the expression LAST #L $X
  831. there is no active instance of the rule #L, the value of the
  832. expression LAST #L $X equals NULL. If such LAST #L $X is in the
  833. left side of assignment statement, the statement is disregarded
  834. (and run time error message is output).
  835. By the help of LAST we may refer to both rule and main
  836. program variables.
  837. 10.3 Attribute Grammars and RIGAL. Global Attributes
  838. There is a close analogy between attribute grammar and RIGAL
  839. program. Rules in RIGAL correspond to grammar nonterminals, and
  840. variables - to attributes.
  841. In attribute grammars the greatest part of attributes are
  842. used as transit attributes. To avoid this global attributes are
  843. introduced in the attribute grammar implementations. The usage of
  844. LAST references solves this problem in RIGAL.
  845. Implementation of both synthesized and inherited attributes
  846. is possible as it is demonstrated in the following scheme.
  847. #LA ... ... ...
  848. assigns value to the attribute $A
  849. calls #LB ... ... ... ##
  850. #LB ... ... ...
  851. $B1 := LAST #LA $A -- uses the inherited attribute $A from #LA
  852. calls #LC
  853. -- after this call the value of the attribute $C
  854. -- from #LC is assigned to the synthesized attribute $B2
  855. ... ... ... ##
  856. #LC ... ... ...
  857. assigns value to the attribute $C
  858. LAST #LB $B2 := $C -- the value is assigned to the
  859. -- synthesized attribute $B2 of #LB
  860. ... ... ... ##
  861. 11. Built-in Rules
  862. There is a number of built-in rules in the language. These
  863. rules implement functions, the implementation of which is
  864. impossible or ineffective by other language means.
  865. Call of built-in rules is written down the same way as call
  866. of rules defined by the user itself. Along with the value a
  867. built-in rule yields success or failure hence, built-in rules are
  868. used as patterns.
  869. There are predicates such as #ATOM(E), #NUMBER(E), #IDENT(E),
  870. #LIST(E) and #TREE(E).
  871. The built-in rule #LEN(E) returns numerical atom as value. If
  872. E is an atom, then for non-numerical atoms it returns the number
  873. of atom symbols. For numerical atoms this rule returns the number
  874. of significant digits plus 1, if the atom is a negative number.
  875. #LEN( NULL) equals 0, #LEN('ABC') equals 3, #LEN(-185) equals 4.
  876. If E is a list, then #LEN(E) returns the number of list
  877. elements, but, if E is a tree, then it returns the number of tree
  878. branches.
  879. #EXPLODE(E). If E is an atom, then it succeeds and returns
  880. one character atom list that represents the value E 'decomposed'
  881. in separate characters. If E is a numerical atom, only
  882. significant digits are present.
  883. Examples.#EXPLODE(X25) yields (. 'X' '2' '5' .).
  884. #EXPLODE(-34) yields (. '-' '3' '4' .).
  885. #IMPLODE(E1 E2 ... EN). This rule yields the concatenation of
  886. atoms or lists E1, E2, ..., EN in a new, non-numerical atom.
  887. Examples.#IMPLODE( A B 34) equals 'AB34'.
  888. #IMPLODE(25 (. A -3 .) ) equals '25A-3'.
  889. #CHR(N). The rule returns an atom, which consists of just one
  890. ASCII character with the code N ( 0 <= N <= 127).
  891. #ORD(A). Returns an integer, which is an internal code of the
  892. first character of the nonnumerical atom A.
  893. For instance, #ORD( A) = 65, #ORD( ABC) = 65.
  894. #PARM(T) . Returns list of parameters which was assigned when
  895. the whole program called for execution.
  896. #DEBUG(E). If E equals the atom 'RULES', then, as soon as the
  897. rule is called, information concerning calls of the rules (both
  898. user defined and built-in rules) and their execution results will
  899. be output. The call #DEBUG(NORULES) stops the debugging of the
  900. rules.
  901. 12. Sample Compiler
  902. Compiler for the TOYLAN language, which is a very simple
  903. programming language, is discussed in the following units.
  904. Compiler works in two passes. The first phase is parsing and
  905. construction of the program's intermediate form as abstract
  906. syntax tree. The second phase is code generation.
  907. Description of input and intermediate language grammars by
  908. means of RIGAL is presented. Thus formalized compiler
  909. documentation (admitting checking on a computer) is obtained.
  910. 12.1 TOYLAN Language
  911. The description of TOYLAN syntax can be regarded as
  912. description of acceptable sequences of tokens. Atoms represent
  913. lexical elements of TOYLAN program: keywords, identifiers,
  914. constants, operation signs, delimiters. The context free grammar
  915. of TOYLAN can be described in the form of RIGAL program.
  916. #PROGRAM 'PROGRAM' $Id
  917. (* #DECLARATION ';' *) (+ #STATEMENT + ';' ) ##
  918. #DECLARATION ('INTEGER'!'BOOLEAN') (+ $Id + ',') ##
  919. #STATEMENT ( #ASSIGNMENT ! #INPUT !
  920. #OUTPUT ! #CONDITIONAL ) ##
  921. #ASSIGNMENT $Id ':=' #EXPRESSION ##
  922. #INPUT 'GET' '(' (+ $Id + ',') ')' ##
  923. #OUTPUT 'PUT' '(' (+ #EXPRESSION + ',') ')' ##
  924. #CONDITIONAL 'IF' #EXPRESSION 'THEN' (+ #STATEMENT + ';')
  925. ['ELSE' (+ #STATEMENT + ';' )] 'FI' ##
  926. #EXPRESSION #SUM [ '=' #SUM ] ##
  927. #SUM #FACTOR (* '+' #FACTOR *) ##
  928. #FACTOR #TERM (* '*' #TERM *) ##
  929. #TERM $N ;; -- numeric constant
  930. ('TRUE' ! 'FALSE' ) ;; -- Boolean constants
  931. $Id;; -- variable
  932. '(' #EXPRESSION ')' ##
  933. Context conditions are the following:
  934. 1) all variables used in statements and expressions must be
  935. declared,
  936. 2) one and the same variable name should not be declared
  937. twice,
  938. 3) left and right parts of assignment statement must be of
  939. the same type,
  940. 4) operands of input-output statements must be of the type
  941. INTEGER.
  942. All variables of the type INTEGER have initial value 0, and
  943. all variables of the type BOOLEAN have initial value FALSE.
  944. 12.2 Intermediate Form of Program.
  945. Abstract syntax tree
  946. Special languages to represent intermediate results of
  947. compilation are used in compiler building practice. For instance,
  948. P-code for PASCAL compilers and language DIANA [8] for ADA.
  949. The result of the first phase of the TOYLAN compiler is a
  950. tree. Let's call it abstract syntax tree. Along program
  951. components it contains some semantic attributes, for instance,
  952. types of expressions. One of the most significant attributes is
  953. table of variables, obtained as a result of parsing of the TOYLAN
  954. program declarations.
  955. The structure of abstract syntax tree of the TOYLAN program
  956. is described by the following rules.
  957. #S_PROGRAM
  958. 'PROGRAM'::<. NAME : $Id,
  959. DECLARATIONS : #S_DECLARATIONS ,
  960. STATEMENTS : (.(* #S_STATEMENT *).) .> ##
  961. #S_DECLARATIONS -- variables table
  962. <* $Id : ( INTEGER ! BOOLEAN ) *> ##
  963. #S_STATEMENT
  964. ASSIGNMENT :: <. LEFT : $Id,
  965. RIGHT : #S_EXPRESSION .> ;;
  966. INPUT :: (. (* $Id *) .) ;;
  967. OUTPUT :: (. (* #S_EXPRESSION *) .) ;;
  968. CONDITIONAL :: <. COND : #S_EXPRESSION,
  969. THEN : (.(* #S_STATEMENT *).),
  970. [ ELSE : (.(* #S_STATEMENT *).)] .> ##
  971. #S_EXPRESSION
  972. COMPARE :: <. ARG1 : #S_EXPRESSION, ARG2 : #S_EXPRESSION,
  973. TYPE : BOOLEAN .> ;;
  974. ADD :: <. ARG1 : #S_EXPRESSION, ARG2 : #S_EXPRESSION,
  975. TYPE : INTEGER .> ;;
  976. MULT :: <. ARG1 : #S_EXPRESSION, ARG2 : #S_EXPRESSION,
  977. TYPE : INTEGER .> ;;
  978. <. VARIABLE : $Id , TYPE : ( INTEGER ! BOOLEAN ) .> ;;
  979. <. CONSTANT : $N , TYPE : INTEGER .> ;;
  980. <. CONSTANT : ( 0 ! 1) , TYPE : BOOLEAN .> ##
  981. 12.3 Target Language BAL
  982. The goal of the TOYLAN compiler is to obtain program text in
  983. a low level language BAL. This language is a simplified model of
  984. assembler languages. The memory of BAL-machine is divided into
  985. separate words. Every word may contain an integer, besides, there
  986. are work registers R0, R1, R2, ... of the word size each. Let us
  987. suppose the number of registers to be unlimited, in order to
  988. eliminate the problem of optimal register usage during generation
  989. phase.
  990. Command of BAL ABC: DEFWORD N reserves a word in the memory
  991. and imbeds integer N in it. We can refer to this word in other
  992. commands by name ABC.
  993. Commands of BAL have two operands which are described by the
  994. name of memory word, by the name of register or by the literal of
  995. the type =NNN, where NNN is an integer. Commands can be marked by
  996. labels.
  997. 1) MOV A1,A2 This command moves memory word A1 to memory word A2.
  998. 2) LOAD RI,A Loading of word A into register RI.
  999. 3) SAVE RI,A Unloading of the contents of register RI into memory
  1000. word A.
  1001. 4) ADD RI,A or ADD RI,RJ The sum of operands is imbedded in RI.
  1002. 5) MULT RI,A or MULT RI,RJ Multiplication of operands is
  1003. imbedded in RI.
  1004. 6) COMPARE RI,A or COMPARE RI,RJ If operand values are equal,
  1005. it is 1 that is imbedded in RI, if they are not
  1006. equal, 0 is imbedded.
  1007. 7) BRANCH RI,M If the value of RI is equal to 0, then transfer
  1008. to the command marked by label M takes place,
  1009. otherwise, to the next command.
  1010. 8) JUMP M Unconditional transfer to the command marked by label M.
  1011. 9) EOJ Completes the execution of the BAL program.
  1012. 10) READ A Reads the integer from standard input device and
  1013. imbeds it in word A.
  1014. 11) WRITE A or WRITE RI Outputs the integer from memory word or
  1015. from register to standard output device.
  1016. 12) NOP An empty statement.
  1017. 12.4 Main Module of Compiler
  1018. The main program of the TOYLAN compiler contains calls of the
  1019. first and second compilation phases and file opening statements.
  1020. #TOYLAN_COMPILER
  1021. OPEN REP ' '; --message file is connected with the screen
  1022. $LEXEMS:=#CALL_PAS(35 'A.TOY');
  1023. -- a list of tokens is loaded from the file A.TOY by scanner
  1024. $S_TREE := #A_PROGRAM($LEXEMS);
  1025. -- 1st phase; result of parsing - abstract syntax tree - is
  1026. -- imbedded in the variable $S_TREE; during parsing messages
  1027. -- about discovered errors in file REP can be output.
  1028. IF $S_TREE -> OPEN GEN 'A.BAL'; -- if the tree is created,
  1029. -- then file is opened to output the generated BAL text
  1030. #G_PROGRAM($S_TREE) -- 2nd phase - code generation
  1031. ELSIF T -> REP << errors are discovered FI;
  1032. REP << end of compilation ##
  1033. The compilation listing that contains source text and error
  1034. messages is not envisaged in the TOYLAN compiler. Formation of
  1035. the listing can be a separate phase.
  1036. 12.5 Parsing Phase
  1037. The rule #A_PROGRAM carries out parsing of tokens list,
  1038. checks context conditions, generates error messages and builds
  1039. abstract syntax tree of TOYLAN program.
  1040. Patterns of the rule #A_PROGRAM and of the rules subordinate
  1041. to it, actually, coincide with patterns of the rule #PROGRAM and
  1042. with the associated rules that describe context free grammar of
  1043. the TOYLAN language. Just operations to check context conditions,
  1044. to output error messages and to construct abstract syntax tree
  1045. are added.
  1046. In our parser diagnostics is based on the following
  1047. principles. First of all, "panic" reaction to an error should be
  1048. avoided and several messages concerning one and the same error
  1049. should not be output (though, we can't manage it always),
  1050. secondly, error neutralization is transition to the analysis of
  1051. the next statement, i.e., skip of tokens until the nearest symbol
  1052. ';'.
  1053. #A_PROGRAM -- the rule is applied to the list of tokens
  1054. (. PROGRAM $Id
  1055. (* $DECL++:= #A_DECLARATION ';' *)
  1056. --formation of variables table
  1057. (+ $STATEMENTS !.:= #A_STATEMENT + ';' )
  1058. --formation of statements list
  1059. .) / RETURN 'PROGRAM' :: <. NAME : $Id,
  1060. DECLARATIONS : $DECL ,
  1061. STATEMENTS : $STATEMENTS .>/ ##
  1062. #A_DECLARATION $TYPE := ( INTEGER ! BOOLEAN )
  1063. (+ $Id /IF LAST #A_PROGRAM $DECL.$Id OR $REZ.$Id ->
  1064. REP << VARIABLE $Id DOUBLE DEFINED FI;
  1065. $REZ++:= <.$Id : $TYPE .>/ + ',' ) / RETURN $REZ / ##
  1066. #A_STATEMENT $REZ := ( #A_ASSIGNMENT ! #A_INPUT !
  1067. #A_OUTPUT ! #A_CONDITIONAL ) / RETURN $REZ / ;;
  1068. (* $A!.:=S'($$ <> ';' ) *) -- skip until nearest ';'
  1069. / REP << UNRECOGNIZED STATEMENT $A /
  1070. ##
  1071. #A_ASSIGNMENT $Id ':='/ $LPType := LAST #A_PROGRAM $DECL .$Id;
  1072. IF NOT $LPType -> REP << VARIABLE $Id ' IS NOT DEFINED ' FI /
  1073. $E:= #A_EXPRESSION
  1074. /IF $LPType <> $E . TYPE ->
  1075. REP<< 'LEFT AND RIGHT SIDE TYPES ARE DIFFERENT '
  1076. 'IN ASSIGNMENT STATEMENT ' FI;
  1077. RETURN ASSIGNMENT::<. LEFT: $Id, RIGHT: $E .> /
  1078. ONFAIL IF $LPType -> REP<< 'WRONG EXPRESSION IN ASSIGNMENT' FI ##
  1079. #A_INPUT GET '('
  1080. (+ $E !.:= $Id /IF LAST #A_PROGRAM $DECL.$Id <> INTEGER ->
  1081. REP << $Id 'IN STATEMENT GET IS NOT OF THE TYPE INTEGER'
  1082. FI / + ',' ) ')' / RETURN INPUT :: $E / ##
  1083. #A_OUTPUT PUT '(' (+ $C := #A_EXPRESSION / $E !.:= $C;
  1084. IF $C . TYPE <> INTEGER ->
  1085. REP << OPERAND OF PUT STATEMENT IS NOT OF THE TYPE INTEGER
  1086. FI / + ',' ) ')'/ RETURN OUTPUT :: $E / ##
  1087. #A_CONDITIONAL 'IF' $BE := #A_EXPRESSION
  1088. /IF $BE . TYPE <> BOOLEAN ->
  1089. REP<< CONDITION IS NOT OF BOOLEAN TYPE FI /
  1090. 'THEN' (+ $P1 !.:= #A_STATEMENT + ';' )
  1091. [ 'ELSE' (+ $P2 !.:= #A_STATEMENT + ';' ) ] 'FI'
  1092. / RETURN CONDITIONAL :: <. COND : $BE , THEN : $P1 ,
  1093. ELSE : $P2 .> / ##
  1094. #A_EXPRESSION $A := #A_SUM [ '=' $B := #A_SUM
  1095. / $A := COMPARE::<. ARG1 : $A, ARG2 : $B, TYPE : BOOLEAN.>/ ]
  1096. / RETURN $A / ##
  1097. #A_SUM $A := #A_FACTOR (* '+' $B := #A_FACTOR
  1098. / $A := ADD::<. ARG1: $A, ARG2: $B, TYPE: INTEGER .>/ *)
  1099. / RETURN $A / ##
  1100. #A_FACTOR $A := #A_TERM (* '*' $B := #A_TERM
  1101. /$A := MULT::<. ARG1: $A, ARG2: $B, TYPE: INTEGER .>/ *)
  1102. / RETURN $A / ##
  1103. #A_TERM
  1104. $N / RETURN <. CONSTANT : $N , TYPE : INTEGER .>/;;
  1105. ( ( TRUE / $K :=1/ ) ! ( FALSE / $K :=0 / ) )
  1106. /RETURN <. CONSTANT: $K, TYPE: BOOLEAN .>/ ;;
  1107. $Id / $X:= LAST #A_PROGRAM $DECL.$Id;
  1108. IF NOT $X -> REP << VARIABLE $Id IS NOT DECLARED
  1109. ELSIF T -> RETURN <. VARIABLE: $Id, TYPE: $X .> FI / ;;
  1110. '(' $E := #A_EXPRESSION ')' / RETURN $E / ##
  1111. 12.6 Code Generation Phase
  1112. Code generation is performed when traversing abstract syntax
  1113. tree.
  1114. To avoid possible conflicts between variable names in the
  1115. TOYLAN program and register names (of the type RNNN) and labels
  1116. (of the type LNNN) in the object program, variable names are
  1117. substituted by standard names of the type VARNNN.
  1118. #G_PROGRAM / $LABEL := 0 / --global variable $LABEL serves
  1119. --to generate unique labels.
  1120. PROGRAM::<.DECLARATIONS: $TAB := #TABLE_OF_NUMBERS,
  1121. --creation of the table of unique variable numbers
  1122. STATEMENTS: (.(* #G_STATEMENT *).) / GEN << 'EOJ' /,
  1123. DECLARATIONS : #G_DECLARATIONS .> ##
  1124. #TABLE_OF_NUMBERS <* $Id: $TYPE /$N :=$N+1; $T++:=<. $Id: $N.>/ *>
  1125. /RETURN $T/ ##
  1126. #G_STATEMENT ( #G_ASSIGNMENT ! #G_INPUT !
  1127. #G_OUTPUT ! #G_CONDITIONAL ) ##
  1128. #G_ASSIGNMENT ASSIGNMENT::<. LEFT: $Id := #NAME,
  1129. RIGHT :( ( <. VARIABLE: $Id1:=#NAME .>
  1130. /GEN << MOV @ $Id1 ',' $Id / ) !
  1131. ( <. CONSTANT : $N .> /GEN << MOV @ '=' $N ',' $Id /) !
  1132. ( $NREG := #G_EXPRESSION
  1133. /GEN << 'SAVE' @ 'R' $NREG ',' $Id / ) ) .> ##
  1134. #G_INPUT INPUT::(. (* $Id := #NAME /GEN << READ $Id / *) .) ##
  1135. #G_OUTPUT OUTPUT :: (. (*
  1136. ( ( <. VARIABLE : $Id := #NAME .> /GEN << WRITE $Id / ) !
  1137. ( <. CONSTANT : $N .> /GEN << WRITE @ '=' $N / ) !
  1138. ( $NREG := #G_EXPRESSION /GEN << WRITE @ 'R' $NREG /) )
  1139. *) .) ##
  1140. #G_CONDITIONAL CONDITIONAL ::
  1141. <. COND : $NREG := #G_EXPRESSION
  1142. / $LABEL1 :=#NEW_LABEL(); $LABEL2 :=#NEW_LABEL() /,
  1143. THEN : / GEN << BRANCH @ 'R' $NREG ',L' $LABEL1 /
  1144. (. (* #G_STATEMENT *) .)
  1145. / IF $.ELSE -> GEN << JUMP @ 'L' $LABEL2 FI;
  1146. GEN << @ 'L' $LABEL1 ': NOP' / ,
  1147. [ ELSE : (. (* #G_STATEMENT *) .)
  1148. / GEN << @ 'L' $LABEL2 ': NOP' / ] .> ##
  1149. #G_EXPRESSION --returns the number of the register containing
  1150. --result of the evaluation of expression
  1151. $EXPR
  1152. / $NREG := 0 / -- number of the first accessible register
  1153. /RETURN #G_EXPR($EXPR)/
  1154. ##
  1155. #G_EXPR ( <. VARIABLE: $ID :=#NAME .> !
  1156. <. CONSTANT: $N / $ID := #IMPLODE('=' $N)/ .>)
  1157. / $REG := COPY( LAST #G_EXPRESSION $NREG ) ;
  1158. GEN << 'LOAD' @ 'R' $REG ',' $ID ;
  1159. LAST #G_EXPRESSION $NREG + := 1; RETURN $REG / ;;
  1160. $OP::<. ARG1 : $R1 := #G_EXPR, ARG2 : $R2 := #G_EXPR .>
  1161. / GEN << $OP @ 'R' $R1 ',R' $R2 ; RETURN $R1 / ##
  1162. #G_DECLARATIONS
  1163. <* $ID: $TYPE /$ID1 := #NAME($ID); GEN<< $ID1 ':' DEFWORD 0 /*> ##
  1164. #NEW_LABEL --auxiliary rule
  1165. /LAST #G_PROGRAM $LABEL+:=1;
  1166. RETURN COPY (LAST #G_PROGRAM $LABEL )/ ##
  1167. #NAME $ID --returns standard name of the variable $ID in $TAB
  1168. / RETURN #IMPLODE( VAR LAST #G_PROGRAM $TAB.$ID)/ ##
  1169. 13. Conclusions and Future Work
  1170. As it was demonstrated above, RIGAL supports syntax-oriented
  1171. style of compiler design. Programs written in RIGAL are
  1172. well-structured and it is easy to read and debug them.
  1173. Our experience [13] proves that the optimizing RIGAL compiler
  1174. in VAX/VMS environment makes it possible to implement production
  1175. quality compilers for high level languages.
  1176. RIGAL can be considered as yet another language prototyping
  1177. tool in the sense of [14], because it allows the designer to
  1178. develop an experimental translator in short period of time.
  1179. Besides interpreter for debugging purposes and optimizing
  1180. compiler RIGAL support system includes a cross-referencer, which
  1181. helps to avoid misuse of global variables.
  1182. In order to improve static and dynamic type checking,
  1183. variable type descriptions in the form of formal comments would
  1184. be added to the language.
  1185. Taking in account that control structures of RIGAL program
  1186. are very close to input data structures, it seems promising to
  1187. develop automatic and semiautomatic methods for test example
  1188. generation for the given RIGAL program.
  1189. References
  1190. [1] A.Aho, J.Ullman. The theory of parsing, translation and
  1191. compiling// Prentice-Hall, Inc. Englewood Cliffs,N.J. 1972. -
  1192. vol.1,2.
  1193. [2] S.C.Johnson. YACC - Yet Another Compiler Compiler // Bell
  1194. Laboratories, Murray Hill,N.J., 1978, € technical manual.
  1195. [3] C.H.€.Koster. Using the CDL Compiler Compiler// Lecture Notes
  1196. in Computer Science , Vol.21, Springer-Verlag, Berlin, 1977.
  1197. [4] ˆ.�.€£ ¬¨à§ï­. ‘¨á⥬  â¥å­®«®£¨ç¥áª®© ¯®¤¤¥à¦ª¨ à §à ¡®âª¨
  1198. âà ­á«ïâ®à®¢ "˜€ƒ"// €«£®à¨â¬ë ­¥¡¥á­®© ¬¥å ­¨ª¨ (¬ â¥à¨ «ë
  1199. ¬ â. ®¡¥á¯¥ç¥­¨ï �‚Œ). -1985.-¢ë¯.79.-á. 1-53.
  1200. [5] D.E.Knuth. Semantics of context-free languages// Mathematical
  1201. Systems Theory, 2, 2, 1968, pp.127-146.
  1202. [6] ‚.€.‘¥à¥¡à类¢. Œ¥â®¤ë  âਡãâ­®© âà ­á«ï樨 //
  1203. Ÿ§ëª¨ ¯à®£à ¬¬¨à®¢ ­¨ï. -Œ.� ãª ,1985,á.47-79.
  1204. [7] €.Ž.‚®®£« ©¤, Œ.‚.‹¥¯¯, „.�.‹¨©¡. ‚室­ë¥ ï§ëª¨ á¨á⥬ë ELMA
  1205. //âàã¤ë ’ ««¨­­áª®£® �®«¨â¥å­. ˆ­-â .-1982.-¢ë¯.524-á.79-96.
  1206. [8] The intermediate language DIANA : Design and Implementation
  1207. //Lecture Notes in Computer Science, Vol.180, Springer-Verlag,
  1208. Berlin, 1984.
  1209. [9] R.Vilhelm. Presentation of the compiler generation system MUG2
  1210. : …xamples, global flow analysis and optimization// Le point
  1211. sur la compilation, INRIA, 1978, à.307-336.
  1212. [10] � §¨á­ë© �…”€‹ ¨ ¥£® ॠ«¨§ æ¨ï ­  ¢ëç¨á«¨â¥«ì­ëå ¬ è¨­ å.//
  1213. –�ˆ�ˆ€‘‘, Œ. 1977.
  1214. [11] �.Lucas. Formal definition of programming languages and
  1215. systems // IFIP Congress, 1971.
  1216. [12] Œ.Ganapatti, C.N.Fisher, J.L.Hennessy. Retargetable compiler
  1217. code generation// ACM Computing Survays, 14(4), 1982.
  1218. [13] J.Barzdin, A.Kalnins, M.Auguston, SDL tools for rapid
  1219. prototyping and testing.// in SDL'89 : The language at work,
  1220. ed. O.Faergemand and M.M.Marques, North-Holland, 1989,
  1221. pp.127-133.
  1222. [14] R.Herndon, V.Berzins, The realizable benefits of a language
  1223. prototyping language.// IEEE Transactions on Software
  1224. Engineering , vol.14, No 6, June 1988, pp.803-809