Back

Writing a C++ Interpreter for MITScript

Lab 2 in 6.035 was very satisfying and very fun. After creating the parser/lexer in Lab 1, we got to put our Abstract Syntax Tree to work — we created an interpreter to actually execute valid MITScript! By the end of this lab, we will be able to write arbitrarily complex programs and have them parsed and run.

We have a list of semantic rules that are generally pretty straightforward, but let me point out some interesting aspects.

Note that in the stack we map identifiers(variable, function names, etc.) to addresses, and the heap maps addresses to the actual Values. One cool thing is that we don’t need to implement the heap ourselves — we can just use the memory allocation built into our language! Anyone familiar with C++ will know of new, which actually allocates space in the heap for an object, and returns a pointer (address) to that space.

It seems like a daunting task, but you just have to break it up into small chunks.

The professor, Armando, gave us some direction on Piazza for how to get started.

  • Start by defining your Value types. Define a Value class and define sub-classes corresponding to each of the different value types.
  • Define a StackFrame class to represent your stack frames. You can implement the update and read functions as methods in the stack frame.
  • Implement a Visitor that corresponds to your interpreter.
  • By tomorrow, you should also be able to start implementing the behavior for all your operators.
  • The ‘newStackFrame’ function and the complete logic for updating and reading variables when you have multiple scopes will become clear after we do the lecture on closures.
  • Finally, you will want to implement support for returns that are not at the end of the function and for native functions.

Jason Priest, Meghana, and I got together to work on it.

The key thing to note when implementing the interpreter is that everything is specified in the semantics — you need to follow it exactly when translating into code, and you can find the answer to all your questions by close examination.

Not exactly following the order that Armando suggested, the first thing I tried to do was to get assignments and operators working, so I could do something as simple as x = 5; y = 5; z = x + y;. This would have given me a huge confidence boost, since I felt pretty intimidated by the scope of the lab.

Roughly, I tackled this piece-by-piece in the following order: binary ops, assignments, record creation/assignments, function creation/calling, native functions, early returns. Of course, the ninety-ninety aphorism of software engineering holds:

The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.

The remaining 10% of code on this lab was debugging the ten class-provided tests.

Data Structures

I first set up the Visitor interface for the interpreter. For more information on how the Visitor interface works, see my previous post.

Then for each of the Value types

1
2
3
4
5
6
Boolean
Integer
String
Function ::= (frame, code)
Record ::= Map[String, Value]*
None

, I defined toString() functions and a type field (so the overloaded boolean operators like + know what values they’re operating on). I also added methods like asFunction() and asRecord() that will throw an IllegalCastException if they’re called in the wrong context, or otherwise return this.

I still wasn’t sure how to deal with Stack Frames. The way that stack frames work is that there is a global stack of these StackFrames, and each StackFrame should contain the “context” of the currently-executing function. When a function is called, a new StackFrame is created, containing its set of local variables and a pointer to its parent frame (the frame where the function was called.)

I discovered during debugging that, when an MITScript program starts executing, it should be inside the global StackFrame — all assignments that it makes are available globally.

When we need to access a variable, we must check for its existence in the globals, the local frame, and then search recursively in parent stack frames.

Function calls

The function call is the most complex construct in our language. Most importantly, when we make a call, we need to traverse the body of the function, looking for globals and assignments, and add them to our new StackFrame.

This is another case where I needed to know the type of objects — but here, I need to know the type of an AST object. I use dynamic_cast.

1
2
3
4
5
if (Global* globCast = dynamic_cast<Global*>(stmt)){
...
} else if (IfStatement* ifCast = dynamic_cast<IfStatement*>(stmt)){
...
}

Note that we need to recursively search inside if statements and while loops during the scan, as well.

Native Functions

But how to do the native functions? Jason gave me a hint here — create a subclass of Function, which I call NativeFunction.

We need to support the following:

  • print(s) Uses the default casting of s to a string and prints it to the console followed by a newline.
  • input() Reads a line of input from the console and returns it as a string value.
  • intcast(s) Expects a string and internally uses the c++ function atoi to parse the string and return an integer value. If the string does not represent an integer (e.g., the string “hello”), the function should raise an IllegalCastException

NativeFunction has an evaluateNativeFunction() method that will do the right thing.

After adding the ability to return early from functions, I have a really hacky interpreter!

Debugging tests

Some of these test cases reveal interesting effects of our semantics. For example, test7.mit has a global declaration inside an if statement that’s never run — equivalent to

1
2
3
if (false){
global x;
}

In this case, we actually need to find this global and make x global in the scope. This is called variable hoisting.

Note that there are a lot of crappy things about this interpreter — there is no garbage collection of unused objects, so memory usage might blow up, and it is costly to search recursively for a variable in parent stack frames. The next lab translates our MITScript into bytecode that can optimize away these costs.