CS 884 Advanced Topics in Programming Languages

WRIGHT STATE UNIVERSITY
Department of Computer Science
and Engineering

CS884 Spring 2008 Prasad

Compiling and Evaluating Expressions

An EBNF (Extended Backus-Naur Form) grammar for arithmetic expressions containing variables ( a, i) and binary operators (+, *) is given below.

<expr> -> <term> { + <term> }

<term> -> <factor> { * <factor> }

<factor> -> <var> | ( <expr> )

<var> -> a | i

{<expr>, <term>, <factor>, <var>} are the non-terminals. {a,i,+,*,(,)} are the terminals. <expr> is the start symbol. The meta-symbol "->" separates the lhs and the rhs of a production rule, the meta-symbol "|" represents alternatives on the rhs, and the paired curly braces "{...}" stands for Kleene-star operator (that is, 0 or more iterations of the enclosed regular expression).

Some example arithmetic expressions derivable in the grammar are "i", "( a + a ) * i", " (i * a) + (a)", etc. Some illegal expressions are "(b)", "6", etc. (Note that the double quotes are not part of the expression.)

Now consider the following template for a collection of C# programs.

class Test {     static double f(double a, int i) {
       return <expr>;
    }     public static void Main(string[] args) {
       System.Console.WriteLine( f(2.0,1));
    }
}

To obtain a valid C# program (that is, valid function body), replace <expr> with an expression derived from the above grammar.

A C# compiler (csc) takes the source code and generates MSIL code, which resembles assembly language instruction for a stack machine.

The translation of the return-expression "(i + a * i)" into MSIL code is:

        ldarg.1
        conv.r8
        ldarg.0
        ldarg.1
        conv.r8
        mul        add

The translation of the return-expression "(i + i)" into MSIL code is:

ldarg.1
ldarg.1 add conv.r8

The formal arguments a and i are encoded as variables in locations 0 and 1 in the context of static methods. ldarg.0 (ldarg.1) stands for pushing the value of the variable i (a) on top of the stack ; add (mul) stand for popping the top two appropriate values from the stack, adding (multiplying) them, and pushing the result on top of the stack; and conv.r8 stands for coercing an integer value to a double value.

To generate more examples of such translation on a PC, instantiate the above template by replacing <expr> with a legal arithmetic expression in "Test.cs", compile it using "%csc Test.cs", and reverse engineer the class file using
"%ildasm Test.exe", focussing on "Method static double f(double, int)".

Now consider the following template for a collection of Java programs.

class Test {     double f(double a, int i) {
       return <expr>;
    }     public static void main(String[] args) {
       System.out.println( f(2.0,1));
    }
}

To obtain a valid Java program (that is, valid function body), replace <expr> with an expression derived from the above grammar.

A Java compiler (javac) takes the source code and generates Java bytecode, which resembles assembly language instruction for a stack machine.

The translation of the return-expression "(i + a * i)" into Java bytecode is:

        iload_1
        i2d
        dload_2
        iload_1
        i2d
        dmul        dadd

The translation of the return-expression "(i + i)" into MSIL code is:

iload_1
iload_1 add i2d

The formal arguments i and a are encoded as variables in locations 1 and 2 in the context of instance methods. (Location 0 is reserved for this pointer. The double value requires two registers, that is, registers 2 and 3 are needed for storing a.) A typical instruction encodes information about the type and the location of the operands, and the nature of the operation. For example, iload_1 (dload_2) stands for pushing the value of the integer (double) variable i (a) on top of the stack ; dadd (imul) stands for popping the top two double (integer) values from the stack, adding (multiplying) them, and pushing the result on top of the stack; and i2d(d2i) stands for coercing an integer (a double) value to a double (an integer) value.

To generate more examples of such translation on a PC, instantiate the above template by replacing <expr> with a legal arithmetic expression in "Test.java", compile it using "%javac Test.java", and reverse engineer the class file using
"%javap -c Test.class", focussing on "Method double f(double, int)".

PART I: Write a Java applet that takes three inputs -- (i) an arithmetic expression discussed above, (ii) whether the method is static or instance, and (iii) whether the target is Java bytecode or MSIL code, and then generates the appropriate translation.

Asg2.ppt illustrates basics of code generation and ExprcEg.java gives an incomplete program that you need to understand and then modify to get a working solution. It already provides code illustrating file I/O, scanning, and abstract syntax tree construction. (Specifically, it uses java.io.StreamTokenizer for scanning. Feel free to change it, if necessary)

PART II: Now generalize the above language by including relational operators. Document the syntax using EBNF and explain the translation algorithm informally.

PART III: Now generalize the above language by including boolean connectives. Document the syntax using EBNF and explain the translation algorithm informally.

PART IV (Optional): Time permitting, you can consider allowing more input arguments, additional primitive types, additional reference types, choice of associativity for the operators, etc, and a more flexible GUI for the applet.

What to hand in?

        Submit your well-documented solution files with appropriate term paper via email as a single archive (as far as possible) with an additional ReadMe.txt (giving instructions for how to deploy/invoke the applet) to t.k.prasad@wright.edu. Keep in mind that our ultimate goal is to develop an applet that can benefit students understand code generation for expressions in this simple setting.

T. K. Prasad ( 04/13/2008 )