An EBNF (Extended Backus-Naur Form) grammar for
arithmetic expressions containing variables (
a,
i) and binary operators (+,
*) is given below.
<expr> -> <term> { + <term> }
<term> -> <factor> { * <factor> }
<factor> -> <var> | ( <expr> )
<var> ->
a |
i
{<expr>, <term>, <factor>, <var>} are the non-terminals. {a,i,+,*,(,)} are the terminals. <expr> is the start symbol. The meta-symbol "->" separates the lhs and the rhs of a production rule, the meta-symbol "|" represents alternatives on the rhs, and the paired curly braces "{...}" stands for Kleene-star operator (that is, 0 or more iterations of the enclosed regular expression).
Some example arithmetic expressions derivable in the grammar are "i", "( a + a ) * i", " (i * a) + (a)", etc. Some illegal expressions are "(b)", "6", etc. (Note that the double quotes are not part of the expression.)
Now consider the following template for a collection of C# programs.
class Test {
static double f(double a, int i) {
return <expr>;
}
public static void Main(string[] args) {
System.Console.WriteLine( f(2.0,1));
}
}
To obtain a valid C# program (that is, valid function
body), replace <expr> with an expression
derived from the above grammar.
A C# compiler (csc) takes the source code and generates MSIL code, which resembles assembly language instruction for a stack machine.
The translation of the return-expression "(i + a * i)" into MSIL code is:
ldarg.1
conv.r8
ldarg.0
ldarg.1
conv.r8
mul
add
The translation of the return-expression "(i + i)" into MSIL code is:
ldarg.1
ldarg.1
add
conv.r8
The formal arguments a and i are encoded as variables in locations 0 and 1 in the context of static methods. ldarg.0 (ldarg.1) stands for pushing the value of the variable i (a) on top of the stack ; add (mul) stand for popping the top two appropriate values from the stack, adding (multiplying) them, and pushing the result on top of the stack; and conv.r8 stands for coercing an integer value to a double value.
To generate more examples of such translation on a PC, instantiate the above
template by replacing <expr>
with a legal arithmetic expression in "Test.cs",
compile it using "%csc
Test.cs", and reverse engineer the class
file using
"%ildasm Test.exe",
focussing on "Method static double f(double,
int)".
Now consider the following template for a collection of Java programs.
class Test {
double f(double a, int i) {
return <expr>;
}
public static void main(String[] args) {
System.out.println( f(2.0,1));
}
}
To obtain a valid Java program (that is, valid function body), replace <expr> with an expression derived from the above grammar.
A Java compiler (javac) takes the source code and generates Java bytecode, which resembles assembly language instruction for a stack machine.
The translation of the return-expression "(i + a * i)" into Java bytecode is:
iload_1
i2d
dload_2
iload_1
i2d
dmul
dadd
The translation of the return-expression "(i + i)" into MSIL code is:
iload_1
iload_1
add
i2d
The formal arguments i and a are encoded as variables in locations 1 and 2 in the context of instance methods. (Location 0 is reserved for this pointer. The double value requires two registers, that is, registers 2 and 3 are needed for storing a.) A typical instruction encodes information about the type and the location of the operands, and the nature of the operation. For example, iload_1 (dload_2) stands for pushing the value of the integer (double) variable i (a) on top of the stack ; dadd (imul) stands for popping the top two double (integer) values from the stack, adding (multiplying) them, and pushing the result on top of the stack; and i2d(d2i) stands for coercing an integer (a double) value to a double (an integer) value.
To generate more examples of such translation on a PC, instantiate the above
template by replacing <expr>
with a legal arithmetic expression in "Test.java",
compile it using "%javac
Test.java", and reverse engineer the class
file using
"%javap -c
Test.class", focussing on "Method
double f(double, int)".
PART I: Write a Java applet that takes three inputs -- (i) an arithmetic expression discussed above, (ii) whether the method is static or instance, and (iii) whether the target is Java bytecode or MSIL code, and then generates the appropriate translation.
Asg2.ppt illustrates basics of code generation and ExprcEg.java gives an incomplete program that you need to understand and then modify to get a working solution. It already provides code illustrating file I/O, scanning, and abstract syntax tree construction. (Specifically, it uses java.io.StreamTokenizer for scanning. Feel free to change it, if necessary)
PART II: Now generalize the above language by including relational operators. Document the syntax using EBNF and explain the translation algorithm informally.
PART III: Now generalize the above language by including boolean connectives. Document the syntax using EBNF and explain the translation algorithm informally.
PART IV (Optional): Time permitting, you can consider allowing more input arguments, additional primitive types, additional reference types, choice of associativity for the operators, etc, and a more flexible GUI for the applet.
What
to hand in?
Submit your well-documented solution files with appropriate term paper
via email as a single archive (as far as possible) with an additional ReadMe.txt (giving
instructions for how to deploy/invoke the applet)
to t.k.prasad@wright.edu. Keep
in mind that our ultimate goal is to develop an applet that can benefit students
understand code generation for expressions in this simple setting.