Algol-68 Compiler

Designed and implemented by Aristotelis Tsirigos, no rights reserved





1. Introduction


The purpose of this project was to design and implement a compiler for a subset of programming language Algol-68 that is able to bootstrap. The exact language subset is presented in the section 2 . Section 3 has a list and explanation of the implemented built-in functions. The way this compiler was designed can be found in section 4 and the various restrictions or deviations with respect to Algol-68 in section 5 . Discover how to run and monitor the compiler in sections 6 and 7 . Information about boostrapping and further testing are found in sections 8 and 9 . Section 10 , expains how various extensions can be implemented, and, finally, the source files can be downloaded from the last section, section 11

This project was for the Honors Compilers course at NYU Computer Science department. In the next section



Back to table of contents!




2. The implemented language subset


This is the subset of Algol-68 recognized by my compiler. It is presented in free-style BNF format, with the symbol "|" representing alternative productions, the eclosing marked by "[" and "]" representing an optional clause, and "..." denoting repetition. The basic structural element of Algol-68 is the expression:


expression =>

declaration
| BEGIN series END
| '(' series ')'
| expression ':=' expression  
| expression OR expression
| expression AND expression
| NOT expression
| ABS expression
| REPR expression
| UPB expression
| expression '<' expression
| expression '>' expression
| expression '>=' expression
| expression '<=' expression
| expression '=' expression
| expression '/=' expression
| expression '+' expression
| expression '-' expression
| expression '*' expression
| expression '/' expression
| expression MOD expression
| '-' expression
| IDENTIFIER OF expression
| expression '[' expression ']'
| IF series THEN series [ ELIF series ... ] [ ELSE series ] FI
| WHILE series DO series OD
| IDENTIFIER
| IDENTIFIER '(' argument_list ')'
| INTCONST
| STRCONST
| CHARCONST
| TRUE
| FALSE

declaration =>

MODE AMODE '=' amode
| amode IDENTIFIER  [ '=' expression ] ',' ...
| PROC IDENTIFIER '='  [ '(' parameter_list ')' ] amode ':' expression

amode =>

INT
| CHAR
| BOOL
| STRING
| TK_REF amode
| '[' expression ']' amode
| '[' ']' amode
| STRUCT '(' parameter_list ')'

parameter_list =>

amode IDENTIFIER, amode IDENTIFIER, ...

series =>

expression ';' expression ';' ... ';' expression

argument_list =>

expression ',' expression ',' ... ',' expression


Back to table of contents!



3. Built-in functions


Some built-in functions were implemented in order to provide basic Input/Output functionality:


Back to table of contents!




4. Compiler description


This is a simple one-pass compiler, which takes as an input a files containing Algol code, and compiles it to MMIX assembly code. It consists of five majors units:
Their interaction is depicted in the figure below:

The compiler units


The lexical analyzer (scanner) reads a file containing an Algol program, collecting characters into tokens. The parser-scanner interface consists of a two functions, the first one inquiring about the current token being read, and the second one accepting a particular token. If the expected token is not found, an error message appears on the screen and compilation is abandoned. It should be evident from our discussion so far that the Algol grammar had to be converted into an equivalent LL(1) grammar, so that just one look-ahead token would suffice for the parser to accept the correct language.

The parser, while making sure the syntax rules are properly followed, it update the symbol table with all the necessary information about the program's symbols (variables, procedures, temporaries, labels, arguments, etc.), like for example, their names, modes, nesting depth, value. The symbol table unit provides a variety of functions that can be used to enter all these different kinds of symbol, as well as a lookup function, which given the name of a symbol, it returns its location in the symbol table. Finally, the parser also generates intermediate code quadruples. When the parser reaches the end of a procedure which happens to be at the maximum nesting depth (and no MMIX code has been generated for it), it suspends its operation and lets the MMIX code generator produce the appropriate MMIX code for that procedure. Note that, before performing its task, the MMIX code generator has to know how and where each of the symbols will be stored during the execution of the program. This information is provided by the register/memory assignment unit. When MMIX code is generated for this procedure, the parser resumes its operation until the next procedure is ready for MMIX code generation.



Back to table of contents!




5. Restrictions


a.
Slicing and "dynamic" arrays

Array slicing was not implemented because it would demand, in the general and most meaningful case, dynamic object allocations. However, all the necessary infrastructure to implement slicing is available. The core of my potential array slicing implementation would be an already existing quadruple called quad_alloc, which was designed in order to allocate stack space for arrays. Right now, it simply initializes the array header, which is two tetrabytes containing the size of the array in octabytes and its upper bound, so that, when an array is passed as a parameter, the function knows what these values are. So, in order to implement slicing as well as arrays whose size is not known at compile time, we must extend quad_alloc , so that it takes not only the array itself as an input, but also the symbol table entry which would hold the array size (or number of elements) at runtime. Having this information, it can allocate the appropriate space and initialize the array header. Garbage collection would not be performed, but since the array is allocated in the stack, its space is fully reclaimed upon exiting its immediate enclosing function.


b. Array bounds

Since I was using gcc as my compiler, I assumed 0 as the lower bound of all my arrays, so I didn't implement LWB, I only implemented UPB, which however gives the number of elements in the array, not its upper bound.


c. FOR/FROM/BY/TO

The compiler code made use of only WHILE loops, so that in the end I wouldn't have to implement the whole loop structure in order to bootstrap. It would be very easy to implement the whole thing; just do the following steps:
d. LOC/HEAP

HEAP was not implemented, hence there was no need for a LOC keyword. We can declared things the way we like, by using either the short or the long declaration form.




Back to table of contents!





6. Running the compiler


The compiler (compiler.exe in my system, but you should more safely download compiler.c and runtime.inc as explained in the source code section) is run as follows:
compiler options filename
The filename has usually the extension .a68 and the compiler produces a MMIX file that has the same name as the input file with the only difference being its extension, which is always set to .mms . Also, the compiler can take 4 options, not mutually exclusive. In its default operation, the compiler outputs the tokens being parsed. Here are the three options:


Option

Function

e
while compiling, output extensive information about the grammar element being parsed
s
stop the compiler's execution each time a stack frame is processed
r
force the compiler to treat all composite arguments as though they are passed by reference
c
adds comments (the corresponding) to the output .mms file


Here is an example:
artsi: [34] ~/compilers > compiler.exe test1.a68
Parsing...
 BEGIN  PROC  fac  = Stack frame produced
 (  INT  i  )  INT  :  IF  i  >  0  THEN  printi  (  i  )  ;
 printn  ;
 i  *  fac  (  i  -  1  )  ELSE  1  FI Generating code...
Stack frame produced
1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17: * other instruction fac
17: ;
 INT  b  :=  47  ;
 INT  a  :=  b  *  7  +  10  *  IF  b  <  10  THEN  10  ELSE  b  +  b  FI  ;
 printi  (  a  )  ;
 printn  ;
 printi  (  fac  (  10  )  )  ;
 printn  ;
 END Generating code...
Stack frame produced
0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:Done!
This was the output in default mode. Here is what we get when we run the compiler with the -e option:
artsi: [35] ~/compilers > compiler.exe -e test1.a68
Parsing...

START expression
FACTOR( BEGIN
ENTER: series

START expression
FACTOR(
ENTER: procedure declaration
 PROC  fac  = Stack frame produced
 (  INT  i  )  INT
NOW: procedure body
 :
START expression
FACTOR(
ENTER: IF-expression
 IF
ENTER: series

START expression
FACTOR( i SLICE -> <- ) > FACTOR( 0 )
Failed to free temp!

Failed to free temp!

DONE expression WITH mode BOOL

EXIT: series

Temp 1 freed
 THEN
ENTER: series

START expression
FACTOR( printi  (  START: arg_list
START expression
FACTOR( i SLICE -> <- )
DONE expression WITH mode INT
 DONE: arg_list  )
Failed to free temp!

Failed to free temp!
)
DONE expression WITH mode VOID
 ;
Temp 1 freed


START expression
FACTOR( printn )
DONE expression WITH mode EMPTY
 ;
Failed to free temp!


START expression
FACTOR( i SLICE -> <- ) * FACTOR( fac  (  START: arg_list
START expression
FACTOR( i SLICE -> <- ) - FACTOR( 1 )
Failed to free temp!

DONE expression WITH mode INT
 DONE: arg_list  )
Failed to free temp!

Temp 1 freed
)
Temp 2 freed

DONE expression WITH mode INT

EXIT: series

Temp 1 freed
 ELSE
ENTER: series

START expression
FACTOR( 1 )
DONE expression WITH mode INT

EXIT: series

Failed to free temp!
 FI
EXIT: IF-expression
)
DONE expression WITH mode INT
Generating code...
Stack frame produced
1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17: * other instruction fac
17:
EXIT: procedure declaration
)
DONE expression WITH mode INT
 ;
Temp 0 freed


START expression
FACTOR(
START: declaration
 INT  b  :=
START expression
FACTOR( 47 )
DONE expression WITH mode INT

Failed to free temp!

DONE: declaration
)
DONE expression WITH mode EMPTY
 ;
Failed to free temp!


START expression
FACTOR(
START: declaration
 INT  a  :=
START expression
FACTOR( b SLICE -> <- ) * FACTOR( 7 )
Failed to free temp!
 + FACTOR( 10 )( * FACTOR(
ENTER: IF-expression
 IF
ENTER: series

START expression
FACTOR( b SLICE -> <- ) < FACTOR( 10 )
Failed to free temp!

Failed to free temp!

DONE expression WITH mode BOOL

EXIT: series

Temp 2 freed
 THEN
ENTER: series

START expression
FACTOR( 10 )
DONE expression WITH mode INT

EXIT: series

Failed to free temp!
 ELSE
ENTER: series

START expression
FACTOR( b SLICE -> <- ) + FACTOR( b SLICE -> <- )
Failed to free temp!

DONE expression WITH mode INT

EXIT: series

Temp 2 freed
 FI
EXIT: IF-expression
)
Temp 1 freed
)
Temp 2 freed

DONE expression WITH mode INT

Temp 1 freed

DONE: declaration
)
DONE expression WITH mode EMPTY
 ;
Failed to free temp!


START expression
FACTOR( printi  (  START: arg_list
START expression
FACTOR( a SLICE -> <- )
DONE expression WITH mode REF INT
 DONE: arg_list  )
Failed to free temp!

Failed to free temp!
)
DONE expression WITH mode VOID
 ;
Temp 1 freed


START expression
FACTOR( printn )
DONE expression WITH mode EMPTY
 ;
Failed to free temp!


START expression
FACTOR( printi  (  START: arg_list
START expression
FACTOR( fac  (  START: arg_list
START expression
FACTOR( 10 )
DONE expression WITH mode INT
 DONE: arg_list  )
Failed to free temp!

Failed to free temp!
)
DONE expression WITH mode INT
 DONE: arg_list  )
Failed to free temp!

Temp 1 freed
)
DONE expression WITH mode VOID
 ;
Temp 2 freed


START expression
FACTOR( printn )
DONE expression WITH mode EMPTY
 ;
Failed to free temp!


EXIT: series
 END SLICE -> <- )
DONE expression WITH mode VOID
Generating code...
Stack frame produced
0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:Done!


Back to table of contents!




7. Monitoring the compiler


If you run the compiler with the -s option, then each time a function is processed and each variable, temporary, argument, etc., is given an offset with respect to the frame pointer, or assigned to a MMIX register, the execution of the compiler is suspended and the user can take control, by issuing commands like stepwise or expression-by-expression execution, or even have a look at the symbol table, the offset/register assignments, the intermediate code generated so far, and various information about each procedure, like the number of temporaries it is using, its label in the MMIX code, and so on. Let us have a look at the following example, where we compile test1.a68 with the -s option:
artsi: [422] ~/compilers > compiler -s test1.a68
Parsing...
 BEGIN  PROC  fac  = Stack frame produced

>
The command h displays information about the available commands:
> h

----------------------
 Command summary
----------------------

r - continue running the parser
s - run the parser in stepwise mode
x - run the parser expression by expression (in series)
t - print current token
d - stack dump
i - show generated intermediate code
f - dump stack frame
p - dump procedure info
q - quit the parser

>
Use command t to display the current token being parsed:
> t
Current token is (=,EQUAL).
>
Command x parses the next expression:
> x

 (  INT  i  )  INT  :  IF  i  >  0

>
By now, a bunch of information has to be inserted into the symbol table. Let's have a look at it (command d):
> d
18: temporary '%1' of mode BOOL and size 8
17: constant '0' OF mode INT
16: START SCOPE __
15: temporary '%0' of mode ? and size 0
14: parameter 'i' of mode INT and size 8
13: START FRAME (nesting 2)
12: label 'L1'
11: procedure 'fac' of label L1 and mode PROC (INT|):INT
10: START SCOPE __
9: variable 'argv' of mode [10] STRING and size 2568
8: variable 'argc' of mode INT and size 8
7: procedure 'fprintf' of label fprintf and mode PROC VOID
6: procedure 'fprinti' of label fprinti and mode PROC (INT|):VOID
5: procedure 'printf' of label printf and mode PROC VOID
4: procedure 'printi' of label printi and mode PROC (INT|):VOID
3: procedure 'int2str' of label int2str and mode PROC (INT|REF STRING|):VOID
2: procedure 'str2int' of label str2int and mode PROC (STRING|):INT
1: START FRAME (nesting 1)
>
We can see where frames start and end, and their nesting level, where scopes start and end, as well as information about procedure, variable, parameter, temporary, and argument entries in the symbol table. We can see that at most 10 command-line arguments can be handled (argv)!

Next we issue the r command, which lets the compiler run until the next stack frame information is produced:
> r

 THEN  printi  (  i  )  ;
 printn  ;
 i  *  fac  (  i  -  1  )  ELSE  1  FI Generating code...
Stack frame produced

>
Now, we can have a look at the frame information, that is, for each entry, its name, id number in the symbol table, storage type, size, offset, register, nesting and mode. To do that, use the f command:
> f
ID                        STO   SIZE      OFFSET   REG NEST MODE
----------------------------------------------------------------------------
i                      ( 14) reg        8          0 $  0    2 INT
%0                  ( 15) reg        8          0 $  1    2 INT
%1                  ( 18) reg        8          0 $  2    2 BOOL
%1                  ( 22) con        0          0 $ -1    2 VOID
&0                  ( 23) reg        8          0 $ -1    2 INT
%1                  ( 25) reg        8          0 $  2    2 INT
%2                  ( 26) reg        8          0 $  3    2 INT
&0                  ( 27) reg        8          0 $ -1    2 INT
%1                  ( 28) reg        8          0 $  2    2 INT
----------------------------------------------------------------------------
FRAME SIZE == 0
----------------------------------------------------------------------------
>
We can also see the intermediate code generated for this function by issuing the i command:
> i
0: MAIN
1: PROC fac of mode PROC (INT|):INT with label L1
2: GREATER i (14) > 0 (17) -> %1 (18)
3: TEST %1 (18) -> L2
4: ARG i (14) of mode INT -> &0
5: CALL procedure printi of mode PROC (INT|):VOID -> %1 (22)
6: PRINTn
7: MINUS %1 (25) = i (14) - 1 (24)
8: ARG %1 (25) of mode INT -> &0
9: CALL procedure fac of mode PROC (INT|):INT -> %2 (26)
10: MULT %1 (28) = i (14) * %2 (26)
11: ALIAS %1 (28) -> %0 (15)
12: JUMP L3
13: LABEL L2
14: ALIAS 1 (32) -> %0 (15)
15: LABEL L3
16: RET value %0 (15)
17: ENDPROC fac
>
We can see the id number for each symbol table entry enclosed in parentheses. Run the compiler until the next stack frame is produced:
> r

1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17: * other instruction fac
17: ;
 INT  b  :=  47  ;
 INT  a  :=  b  *  7  +  10  *  IF  b  <  10  THEN  10  ELSE  b  +  b  FI  ;
 printi  (  a  )  ;
 printn  ;
 printi  (  fac  (  10  )  )  ;
 printn  ;
 END Generating code...
Stack frame produced

>
Using the p command, we can take a look at various information about the procedures (unfortunatelly, I only identify the procedure with their MMIX label):
> p
LABEL      QUADPTR MAXTEMP DONE
--------------------------------------------
Main            0       2    0
str2int         0       0    1
int2str         0       0    1
printi          0       0    1
printf          0       0    1
fprinti         0       0    1
fprintf         0       0    1
L1              1       2    1
>
Procedure fac is identified with label L1. We can see that it starts at quadruple number 1 in the intermediate code list we saw before. It is using two temporaries and we can see that it is already compiled, unlike Main, which will be compiled when we issue another r command:
> r

0:1:2:3:4:5:6:7:8:9:10:11:12:13:14:15:16:17:18:19:20:21:22:Done!
artsi: [423] ~/compilers >
With the completion of the Main compilation, the whole program is compiled into MMIX code in file test1.mms. Let's assemble and then run it:
artsi: [423] ~/compilers > mmixal test1.mms
artsi: [424] ~/compilers > mmix test1
1269
10
9
8
7
6
5
4
3
2
1
3628800
artsi: [425] ~/compilers >
This completes our short demo, hope you enjoyed it, although it will be more fun if you monitor the compilation of a more complex program, like the compiler itself!




Back to table of contents!




8. Bootstrapping


Since I used a C compiler during the developement of this compiler, all composite arguments (strings, arrays) were passed by reference, which means that if you need to preserve them after the function's execution, you have to make a copy yourself, and that's what I did in my compiler code. But when the project was completed, and the bootstrap had to be done,  my Algol compiler, while compiling itself, had to make copies whenever dereferencing was applied. For this reason, the compiler ends up making copies twice, the ones enforced by the Algol compiler at dereferencing (and "missed" by gcc), plus the one I make myself in order to compensate for the no-copy "policy" of C. Needless to say that this slows down the bootstrapping process. In order to speed up the bootstrapping process, I provide a special command-line option (-r) for my compiler, which, when turned on, it emulates the C no-copy policy! In the table below, we can see and compare the time it took: We can also see the size of these programs, measured in MMIX code lines. Two values are provided for all these measurements; one for the case where the -r options was used, and the case where it was not.

Size
scanner.a68
parser.a68
compiler.a68
3,561 / 3,910
mmix scanner.mmo 14sec / 14sec
27sec / 27sec
2min 16sec / 2min 16sec
5,648 / 6,648
mmix parser.mmo
24sec / 26sec
46sec / 48sec
3min 52sec / 4min 8sec
31,101 / 41,508
mmix compiler.mmo
2min 7sec / 2min 30sec
4min 18sec / 5min 8sec
40min 40sec / 55min 23sec

Note that the scanner has the same performance with or without the -r option, because it only makes a few redundant string copies! For the tests of the last row, the results of compilation of the *.a68 files by compiler.mmo were compared with the ones produced by compiler.exe , to verify that the two copies were identical. The test corresponding to the bottom-right cell is the "infamous" bootstrapping:

artsi: [40] ~/compilers > build      # compiles compiler.a68 using a perl script and gcc into executable compiler.exe
artsi: [41] ~/compilers > compiler.exe compiler.a68 > out1      # compiles the compiler with itself, producing compiler.mms
artsi: [42] ~/compilers > mmixal -b 150 compiler.mms          # assemble compiler.mms
artsi: [43] ~/compilers > cp compiler.mms compiler-1.mms       # save compiler.mms to another file for future comparison
artsi: [44] ~/compilers > mmix compiler.mmo compiler.a68 > out2      # run the new compiler on its own source code, producing another compiler.mms
artsi: [45] ~/compilers > diff compiler.mms compiler-1.mms            # finally, compare the two files, and conclude that they are identical!
artsi: [46] ~/compilers >

The environment I used was Cygwin, in case you are wondering how an .exe file seems to be running in UNIX-like environment!



Back to table of contents!




9. Various tests

Here is a bunch of tests, each one targeting and checking a specific compiler feature! They are the actual test files I used to test my compiler as I was adding more and more features, with test1.a68 being the first test I wrote, test2.a68 the second, and so on...


test1.a68

Output

BEGIN

  PROC fac = (INT i) INT:
    IF i > 0 THEN printi(i); printn; i*fac(i-1) ELSE 1 FI;

  INT b := 47;
  INT a := b*7 + 10 * IF b < 10 THEN 10 ELSE b+b FI;
  printi(a); printn;
  printi(fac(10)); printn;

END
artsi: [4] ~/compilers > mmix test1
1269
10
9
8
7
6
5
4
3
2
1
362800

test2.a68

Output

BEGIN

  STRING z;
  z := "Global display seems to be working!";

  PROC StrDemo = (INT i) VOID:
  BEGIN
    IF i > 0 THEN printi(i); printc(':'); prints(z); printn; StrDemo(i-1); FI;
  END;

  StrDemo(11);

END
artsi: [5] ~/compilers > mmix test2
11:Global display seems to be working!
10:Global display seems to be working!
9:Global display seems to be working!
8:Global display seems to be working!
7:Global display seems to be working!
6:Global display seems to be working!
5:Global display seems to be working!
4:Global display seems to be working!
3:Global display seems to be working!
2:Global display seems to be working!
1:Global display seems to be working!

test3.a68

Output

BEGIN

  INT i;
  STRING s;
  [11]INT a;

  PROC sum = ([]INT b) INT:
  BEGIN
    INT s;
    s := 0;
    i := 0;
    WHILE i < UPB a DO s := s + a[i]; i := i + 1; OD;
    s
  END;

  PROC display = ([]INT b) VOID:
  BEGIN
    INT i;
    printf("display: array size is %i\n", UPB b);
    i := 0;
    printf("a = ( ");
    WHILE i < UPB a DO printf("%i ", a[i]); i := i + 1; OD;
    printf(")\n\n");
    i := 0;
    printf("b = ( ");
    WHILE i < UPB b DO printf("%i ", b[i]); i := i + 1; OD;
    printf(")\n\n");
    printf("display: sum of b[i] = %i\n", sum(b));
  END;

  PROC sumi = (INT x, INT y) INT: x + y;

  printf("STARTING...\n");
  printf("Array size is %i.\n", UPB a);

  a[0] := 17;
  a[1] := 52;
  a[2] := 23;
  a[3] := 43;
  a[4] := 51;
  a[5] := 654;
  a[6] := 7153;
  a[7] := 82;
  a[8] := 91;
  a[9] := 10332;
  a[10] := 987;

  display(a);

  printi(sumi(a[9]+a[0],sumi(sumi(a[1]+a[2]+a[3],a[4]),a[10]))); printn;
  printf("Sum of a[i] = %i\n", sum(a));

END
artsi: [6] ~/compilers > mmix test3
STARTING...
Array size is 11.
display: array size is 11
a = ( 17 52 23 43 51 654 7153 82 91 10332 987 )

b = ( 17 52 23 43 51 654 7153 82 91 10332 987 )

display: sum of b[i] = 19485
11505
Sum of a[i] = 19485

test4.a68

Output

BEGIN

  STRING s;
  CHAR z;
  STRUCT(STRING label, INT quadptr, [11]INT a, INT max_temp, BOOL done) proc_info;
  MODE POINT = STRUCT(CHAR y, CHAR z);
  STRUCT(CHAR x, POINT s) vec;

  (a OF proc_info)[1] := 121;
  x OF vec := 'k';
  z := 'a';
  z OF s OF vec := '#';

  s := "z = ";
  prints(s);
  printc(z);
  printn;

  s := "x OF vec = ";
  prints(s);
  printc(x OF vec);
  printn;

  s := "z OF s OF vec = ";
  prints(s);
  printc(z OF s OF vec);
  printn;

  s := "(a OF proc_info)[1] = ";
  prints(s);
  printi((a OF proc_info)[1]);
  printn;

  printf("%s\n", (label OF proc_info := "KARAGIOZHS!"));

END
artsi: [11] ~/compilers > mmix test4
z = a
x OF vec = k
z OF s OF vec = #
(a OF proc_info)[1] = 121
KARAGIOZHS!

test5.a68

Output

BEGIN

  CHAR ch;
  STRING file := "test";

  printf("Create and read file '%s'!\n", file);

  IF establish(5, file) < 0 THEN
    printf("Cannot create file!\n");
    exit;
  FI;

  fprintf("Aris is %i years old%c\n", 27, '!');
  close(5);

  IF open(1, file) < 0 THEN
    printf("Cannot open file!\n");
    exit;
  FI;

  ch := readc(1);
  WHILE ch /= REPR 0
  DO
    printc(ch);
    ch := readc(1);
  OD;

  close(1);

END
artsi: [12] ~/compilers > mmix test5
Create and read file 'test'!
Aris is 27 years old!

test6.a68

Output

BEGIN

  MODE VECTOR = STRUCT(STRING name, INT x, INT y, INT z);
  [10]VECTOR vec;
  INT max_tetra = 65536;

  PROC mmix_sets = (STRING x, INT val) VOID:
  BEGIN
    IF val < 0 THEN
      mmix_sets(x, -val);
      fprintf("\tNEGU %s,0,%s\n", x, x);
    ELSE
      IF val > max_tetra THEN
        printf("\tSETL %s,%i&#ffff\n", x, val);
        printf("\tORML %s,%i>>16\n", x, val);
      ELSE
        printf("\tSETL %s,%i\n", x, val);
      FI;
    FI;
  END;


  PROC disp = (STRING name, INT symbol) VOID:
  BEGIN
    STRING s;
    s := name OF vec[symbol];
    printf("%s\n", s);
    printf("%s\n", name OF vec[symbol]);
  END;


  [10]STRING s;
  INT i;

  s[0] := "Gizem";
  s[1] := "Aris";
  s[2] := "Gizmouli";
  s[3] := "Simbo";
  s[4] := "Yiannis";
  s[5] := "Giorgos";
  s[6] := "Antonis";
  s[7] := "Nikos";
  s[8] := "Ciro";
  s[9] := "Yotam";

  PROC init = VOID:
  BEGIN
    printf("Initializing...\n");
    i := 0;
    WHILE i < UPB vec
    DO
      name OF vec[i] := s[i];
      x OF vec[i] := i + 100;
      y OF vec[i] := 2*i + 200;
      z OF vec[i] := 3*i + 300;
      i := i + 1;
    OD;
    i := 0;
    WHILE i < UPB vec
    DO
      printf("%i: %s(%i %i %i)\n", i, name OF vec[i], x OF vec[i], y OF vec[i], z OF vec[i]);
      i := i + 1;
    OD;
  END;

  init;
  i := 0;
  WHILE i < UPB vec
  DO
    printf("%i: %s(%i %i %i)\n", i, name OF vec[i], x OF vec[i], y OF vec[i], z OF vec[i]);
    i := i + 1;
  OD;

  disp("10239", 2);
  mmix_sets("$11", z OF vec[7]);

END;
artsi: [12] ~/compilers > mmix test6
Initializing...
0: Gizem(100 200 300)
1: Aris(101 202 303)
2: Gizmouli(102 204 306)
3: Simbo(103 206 309)
4: Yiannis(104 208 312)
5: Giorgos(105 210 315)
6: Antonis(106 212 318)
7: Nikos(107 214 321)
8: Ciro(108 216 324)
9: Yotam(109 218 327)
0: Gizem(100 200 300)
1: Aris(101 202 303)
2: Gizmouli(102 204 306)
3: Simbo(103 206 309)
4: Yiannis(104 208 312)
5: Giorgos(105 210 315)
6: Antonis(106 212 318)
7: Nikos(107 214 321)
8: Ciro(108 216 324)
9: Yotam(109 218 327)
Gizmouli
Gizmouli
        SETL $11,321

test7.a68

Output

BEGIN

  PROC dummy = (REF [] INT b) VOID:
  BEGIN
    b[0] := 10;
  END;

  [10]INT a;
  a[0] := 1;
  printf("a[0] := %i\n", a[0]);
  dummy(a);
  printf("a[0] := %i\n", a[0]);

END
artsi: [17] ~/compilers > mmix test7
a[0] := 1
a[0] := 10

test8.a68

Output

BEGIN

  PROC f = (REF STRING t, STRING s) VOID:
  BEGIN
    INT i := 0;
    WHILE s[i] /= REPR 0
    DO
      IF s[i] >= 'a' AND s[i] <= 'z' THEN
        t[i] := REPR (ABS s[i] + ABS 'A' - ABS 'a');
      FI;
      i := i + 1;
    OD;
    printf("  t = %s\n", t);
    printf("  s = %s\n", s);
  END;

  STRING message := "This is a simple message!";

  f(message, message);
  printf("message = %s\n", message);

END
artsi: [18] ~/compilers > mmix test8
  t = THIS IS A SIMPLE MESSAGE!
  s = This is a simple message!
message = THIS IS A SIMPLE MESSAGE!

test9.a68

Output

BEGIN

  PROC StrAppend = (REF STRING s, STRING s1, STRING s2) VOID:
  BEGIN
    STRING t;
    INT i := 0, j := 0;
    WHILE i < 256 AND s1[i] /= REPR 0 DO t[j] := s1[i]; i := i + 1; j := j + 1 OD;
    i := 0;
    WHILE i < 256 AND s2[i] /= REPR 0 DO t[j] := s2[i]; i := i + 1; j := j + 1 OD;
    t[j] := REPR 0;
    s := t;
  END;

  PROC g = (REF STRING s, INT i) VOID:
  BEGIN
    STRING t;
    IF i > 0 THEN
      g(s, i-1);
      int2str(i, t);
      StrAppend(s, t, s);
    ELSE
      s := ".";
    FI;
  END;

  PROC f = (REF STRING t) VOID:
  BEGIN
    g(t, 10);
  END;

  STRING a;
  f(a);
  printf("STRING = %s\n", a);

END
artsi: [27] ~/compilers > run test9
STRING = 10987654321.

test10.a68

Output

BEGIN

  PROC copy = (REF INT x) REF INT:
  BEGIN
    printf("  x = %i\n", x);
    x
  END;

  INT a, b, c;

  a := 10;
  printf("a = %i\n", copy(a));
  copy(b) := 7;
  printf("b = %i\n", b);

  c := 7;
  printf("Continue? ");
  a := b := IF readchar = 'y' THEN c + 11 ELSE 11 FI;
  readchar;

  printf("a = %i, b = %i, c = %i\n", a, b, c);

  printf("Change a or b? ");
  IF readchar = 'a' THEN a ELSE b FI := 1000;

  printf("a = %i, b = %i, c = %i\n", a, b, c);
 
END
artsi: [28] ~/compilers > run test10
  x = 10
a = 10
  x = 0
b = 7
Continue? y
a = 18, b = 18, c = 7
Change a or b? b
a = 18, b = 1000, c = 7

test11.a68

Output

BEGIN

  INT i := 0;

  printf("Number of arguments = %i\n\n", argc);

  WHILE i < argc
  DO
    printf("%i: %s\n", i, argv[i]);
    i := i + 1;
  OD;

END
artsi: [31] ~/compilers > mmix test11 file-1 file-2 file-3 file-4 file-5
Number of arguments = 6

0: test11
1: file-1
2: file-2
3: file-3
4: file-4
5: file-5

test12.a68   

Output

BEGIN

  INT x := 7;
  INT y := 11;
  REF INT p;

  printf("x (= %i) or y (= %i)? ", x, y);
  p := IF readchar = 'x' THEN x ELSE y FI;
  printf("You chose %i!\n", p);
  y := x + 10;
  x := y + 7;
  printf("New x = %i\nNew y = %i\n", x, y);
  printf("New value of dereferenced pointer p = %i\n", p);

END
artsi: [2] ~/compilers > mmix test12
x (= 7) or y (= 11)? x
You chose 7!
New x = 24
New y = 17
New value of dereferenced pointer p = 24

artsi: [4] ~/compilers > mmix test12
x (= 7) or y (= 11)? y
You chose 11!
New x = 24
New y = 17
New value of dereferenced pointer p = 17





Back to table of contents!




10. Extensions


a. Good error recovery

For good error recovery, an LL(1) grammar used to describe the error patterns will be extremely complex. So, the first think I would have to do is to slightly modify the lexical analyzer, so that instead of reading a file token by token, it reads the whole file in an array. This way we can easily look ahead as many tokens as we want. Second, I would extend the function belonging to the scanner-parser interface that accepts a given token, so that it also accepts a second parameter which gives the description of appropriate actions in the case the given token does not much. This parameter will be a set tuples of the following form:
(error-pattern , error-message, actions)
The error-pattern component would be a pattern of tokens (regular expression) identified with a particular error condition. For example, if a procedure declaration misses a return type, we could ignore everthing until we find a colon or a BEGIN keyword. This condition can be easily described using a regular expression. The error-message is the string displayed on the screen, when the associated error pattern is matched. The last component, actions, will dictate the assumptions we will make for the missing tokens. In the above example, we could assume that the return type of the procedure is unspecified, and resolve it later on, by looking at what is being return in the procedure body. Once we decide on the details of the representation, and after implementing the extended version of the token acceptance function, we can easily encode a rich variety of error conditions along with suitable error recovery actions.


b. Optimizations

Two basic optimizations where implemented. The first one involves allocating temporaries. I follow a scheme that minimizes the number of temporaries by simply releasing a temporary exactly at the point where it is not needed any mode. For a simple illustration of this scheme imagine an operator that takes two values right and left and produces a new result. After a quadruple is generated by this operator (for example the + operator), both left and right operands can be "freed". Of course, this simple scheme cannot take advantage of possible common subexpressions. Everything is recalculated. However, it is powerful enough to keep the number of temporaries as a sigificantly low number, and, as a result, less MMIX registers are needed. For a more effective scheme, that takes common subexpressions into account, we have to modify the parser so that it identifies basic blocks and after generating intermediate code for each of these blocks, it passed control onto the register/memory allocation unit, which will be extended, so that it determines for each quadruple the liveness and next-use information for left and right operands. Then, as MMIX code generation proceeds, we should keep information about the current content of registers and the current storage location of  all variables. Remember that my compiler determines all these information once per procedure block, hence assignment is static. The proposed assignment scheme it obviously a dynamic one. Having all this information available, and a goal of minimizing register allocation, we can implement a function getarg that assigns a sub-optimal register to the result of each quadruple according to a wide variety of heuristics as it was explained in class. Also, we can identify and remove redundant quadruples, if we construct a directed acyclic graph that demonstrates the dependences among quadruples. This would be particularly useful in optimizing loops that contain invariant subexpressions. The key conclusion is that I can easily incorporating any of these optimizations by simply extending the register/memory assignment unit, and slightly modify the MMIX code generator unit, so that, instead of retrieving the static storage assignment information about the symbol table entries from a table, it can just call a function of the extended assignment unit (getarg), which will provide it with all the necessary information for assembly code generation.

Finally, I would like to mention my second optimization, which led to significantly faster bootstraps. It has to do with string manipulation. The first component is string copying, which happens very frequently, since strings lie in the heart of almost all information kept in the compiler's tables. An octet by octet copying of strings, as opposed to copying strings on a byte per byte basis, enhanced my compiler's performance by as much as 6 or 7 times, proving the point that string copying is a great percentage of the compiler's computation time. The second component was constant string representation. The first thing you have to do is declare then as constants in MMIX, which I do in the end of the produced MMIX code. However, this is not enough. You must ensure that whenever constant strings as passed as parameters, or whenever they are assigned or compared to variables, no copying is involved. In that sense, I manipulate constant strings essentially through temporaries stored into registers that hold the string's absolute address. These temporaries can then be operands in a quadruple, or argument in a procedure call.



Back to table of contents!




11. Source code


As I noted before, the development and implementation environment was Cygwin, so having the executable file will make no good if you want to test on your system. This is why I supply the C source file, which is the source code after it was passed through a simple Algol-to-C translator (and a bunch of other required filters, like dos2unix, scripts that link the various modules of the code, since, during development, the source code was broken up into 5 different files) written in Perl. Also, you need the runtime library, runtime.inc , which is MMIX code for the some of the built-in functions. The compiler expects to find this file in the same directory as itself. Finally, you will be able to follow the user-friendly link for an easier way to explore the Algol source code, ...when I am not too lazy to set it up! You can download the various files for the different compiler versions from the table below:


Compiler version

Added features/Extensions

Algol source code

C source file

MMIX file

User-friendly link

v1.0
bootstrapping
ac-1.0.a68
ac-1.0.c
ac-1.0.mms
NOT AVAILABLE!
v2.0
basic pointer functionality
ac-2.0.a68
ac-2.0.c
ac-2.0.mms
NOT AVAILABLE!





Back to table of contents!