Well, this was my first foray into the world of Rust, the systems language that is Mozilla’s precious baby. And what better way to learn this hip new language than to write an MITScript bytecode interpreter for Computer Language Engineering?
Rust was not gentle for this first-time developer. It does so much to protect you that my first attempt writing a few hundred lines of code resulted in the same number of compiler errors, and I needed a lot of help from my team to just get anything to compile. For this reason, it’s not great for iterating quickly if you aren’t very experienced already. But, I’m licking my chops at the fact that the end result will be much safer, and hopefully faster, than our classmates’ C++ compilers. This post will be about the struggles I encountered as a Rust newbie, as well as the fun of generating/interpreting MITScript bytecode.
Fortunately, I had some great Rust developers on my team – Jason and James Gilles, who have been avid Rustaceans for years.
If you have ever seen bytecode, you’ll expect to see some non-human-readable hex bytes that stand for instructions. Instead, we have a strange text-based bytecode consisting of the following instructions:
Load/store instructions:
1 | LoadConst |
Closure functions:
1 | AllocClosure |
Binary/unary ops:
1 | Add |
Control flow:
1 | Goto |
Stack manipulation:
1 | Dup |
And the full bytecode is a series of nested functions, like
1 | function |
Some things to note:
A free_var
is a variable from a parent scope that’s accessed in the current function.
A closure contains a function, as well as a list of references to the free variables in the nested function. So, when we make a call, we need to clone all the closure’s free variable references into the newly created frame.
Pieces of the puzzle
We split up the work into three parts. James quickly finished the bytecode parser, using the Rust library lalrpop. Jason and I worked on (last minute) the bytecode generation and interpreting, respectively.
Each bytecode function
was modeled as a Function struct.
I also have a Frame structure that, as in the AST interpreter from Lab 2, contains the context for the currently executing closure. Most importantly, there is an operand stack onto which references, values, closures, etc. are pushed and popped as opcodes are executed. The HeapValue enum lists them all:
1 | type FrameSlot = Gc<RefCell<HeapValue>>; |
References have a layer of indirection provided by the Gc<RefCell<>> construct in Rust. Gc is just an alias for Rc
. In Rust, Rc
is a reference-counted pointer, and RefCell
is a wrapper for a mutable value. RefCell
introduces something called “interior mutability”, which allows us to modify an otherwise-immutable location like a field in a struct. We can use borrow
or borrow_mut
to get a pointer to the value contained in the RefCell
.
The bytecode generator
Jason did this part almost entirely himself, and had to wrangle with some tricky scoping rules that I still don’t entirely understand.
One key function is lookup_variable
, which determines whether a variable is a reference, local, or global.
1 | /// Determines what scope a variable's value should be found in |
lookup_ref
also propagates up to parents, pushing a reference variable in parent scopes, because it has to be in each parent scope to be copied down to the current scope! To be more clear, say we have nested functions f1, f2…f100; and f100 uses some variable x
that is local to f3. f4…f99 must all have x
as a free variable, so when we see that x
is being used in f100, lookup_ref
will add it as a free variable for all parents.
Next, we actually need to start thinking about performance! We’ll write a garbage collector for our compiler in Lab 4.