L3.
Different from L2:
- an expression based language
- function call convention is hidden
- numbers are not encoded, i.e. calling (print 1) prints out "1\n".
- no direct memory references (have to use aref, aset, etc)
Like L2:
- every intermediate result has a name
============================================================
The grammar.
p ::= (e
(l (x ...) e)
...)
;; note that functions have arguments now
e ::= (let ([x d]) e)
| (if v e e)
| d
d ::= (biop v v)
(pred v)
(v v ...)
(new-array v v)
(new-tuple v ...)
(aref v v)
(aset v v v)
(alen v)
(print v)
(make-closure l v)
(closure-proc v)
(closure-vars v)
v
v :: = x | l | num
biop ::= + | - | * | < | <= | =
pred ::= number? | a? ;; a? tests to see if the argument is an array or a tuple
============================================================
Programs in this language make the order of evaluation explicit
via lets. So instead of writing something like this:
(+ (+ 1 2) (+ 3 4))
you have to write something like this:
(let ([x (+ 1 2)])
(let ([y (+ 3 4)])
(+ x y)))
showing that the (+ 1 2) happens first and then the (+ 3 4).
Here's our old friend fib:
((:fib 18)
(:fib (x)
(let ([xlt2 (< x 2)])
(if xlt2
1
(let ([x1 (- x 1)])
(let ([f1 (:fib x1)])
(let ([x2 (- x 2)])
(let ([f2 (:fib x2)])
(+ f1 f2)))))))))
============================================================
So, compilation:
- linearizes the expression,
- explicates the calling convention (including tail calls vs non-tail calls)
- handles the encoding of pointer values & integer values
Three cases for compiling an 'e':
1) the e is a let: (let ([x d]) e)
-> compile the d, store the result in x, and continue with the body
an application expression here is a non-tail call.
2) the e is an if: (if v e1 e2)
-> generate a test for the v that goes to either then-label or else-label.
generate the then-label,
generate the code for e1
generate the else-label
generate the code for e2
Why don't we need a join here?
The last thing inside an 'e' is always the result of our program, so
if it is a call, we're fine, the result went away (a tail call), or if
it isn't then we're going to insert a return.
3) the e is a d:
-> if it is an application, make a tail call
otherwise, generate the code for the d,
store the result in eax, and return.
Many cases for compiling a 'd'. When compiling a 'd', we always have a
destination for it; from a let, the destination is the variable. From
the 'd' at the end of the expression, the destination is eax, since
that's the result of the function.
Lets look at a couple.
------------------------------------------------------------
(let ([x (+ y z)]) ...) ->
`(,x <- ,y)
`(,x += ,z)
`(,x -= 1)
What if the 'y' or 'z' were constants? Do we have four cases here?
Nah, we just encode any constants we see and let something else
clean up.
(let ([x (+ v1 v2)]) ...) ->
`(,x <- ,(encode v1))
`(,x += ,(encode v2))
`(,x -= 1)
where encode turns a number into the encoded version and leaves
variables alone.
Why is adding them together and then subtracing one the right thing?
Well, if x is initialized with a number 2a+1, and then we increment
that by 2b+1, we have 2a+2b+2 in x. The number we want is
2(a+b)+1, since that's the encoding of the sum. The difference
between these: 1. So just subtract one.
Note that if L1 signalled errors on overflow, this would not be
correct, since 2a+2b+2 might overflow when 2(a+b)+1 would not. But
since we have modular arithmetic, this equivalence holds.
------------------------------------------------------------
(let ([x (* v1 v2)]) ...) ->
In this case, we don't have some kind of a clever trick since the product
(2a+1) * (2b+1)
is not so useful when trying to compute
2(a*b)+1
So instead we just decode the numbers and re-encode them:
`(,tmp <- ,(encode v1))
`(,tmp >>= 1)
`(,x <- ,(encode v2))
`(,x >>= 1)
`(,x *= ,tmp)
`(,x *= 2)
`(,x += 1)
where 'tmp' is a new, fresh variable
------------------------------------------------------------
(let ([x (<= y z)] ...) ->
`(,x <- ,y <= ,z)
`(,x <<= 1)
`(,x += 1)
Don't forget to encode.
Also note that boolean values are still represented as integers (zero
is false, everything else is true).
------------------------------------------------------------
(let ([x (a? v1)]) ...) ->
`(,x <- ,(encode v1))
`(,x &= 1)
`(,x *= -2)
`(,x += 3)
------------------------------------------------------------
(let ([x (alen v)]) ...) ->
`(,x <- (mem ,v)) ;; v can't be a constant here or else this program doesn't work anyways.
`(,x <<= 1)
`(,x += 1)
The size stored in the array is the decoded version of the size, so
we need to encode it so it cooperates with the rest of the program.
------------------------------------------------------------
(let ([x (aset v1 v2 v3)]) ...) ->
`(,x <- ,(encode v2))
`(,x >>= 1)
`(,x *= 4)
`(,x += ,v1)
`((mem ,x 4) <- ,(encode v3))
`(,x <- 1) ;; put the final result for aset into x (always 0).
What's wrong with that? No bounds checking! How do we do the bounds checking?
Here we use the array-error L2 instruction:
(eax <- (array-error s s))
It accepts an array and an (attempted) index, prints out an error message
and terminates the program. Using that we can do the bounds checking:
`(,x <- ,(encode v2))
`(,x >>= 1)
`(,tmp <- (mem ,v1 0))
`(cjump ,x < ,tmp ,bounds-pass-label ,bounds-fail-label)
bounds-fail-label
`(eax <- (array-error ,v1 ,(encode v2)))
bounds-pass-label
`(,x *= 4)
`(,x += ,v1)
`((mem ,x 4) <- ,(encode v3))
`(,x <- 1) ;; put the final result for aset into x (always 0).
Note that tmp, bounds-fail-error and bounds-pass-label all have to be
freshly generated.
Note that this does not completely check the bounds, since the index
may also be less than 0.
------------------------------------------------------------
One way to compile the closure primitives:
(make-closure a b) is the same as (new-tuple a b)
(closure-proc is the same as (aref a 0)
(closure-vars a) is the same as (aref a 1)
------------------------------------------------------------
(let ([w (f x y z)]) ->
`(ecx <- ,(encode x))
`(edx <- ,(encode y))
`(eax <- ,(encode z))
`(call ,f) ;; note that 'f' might be a variable that refers to a label, but not a constant...
`(,w <- eax)
Function calls are straightforward when it isn't a tail call.
But what if this was a tail call? Tail calls are the ones at the
bottom of "e"s, right? (If the call is in a let, there is more to do,
namely the body of the let.) In that case, we can just do this:
`(ecx <- ,(encode x))
`(edx <- ,(encode y))
`(eax <- ,(encode z))
`(tail-call ,f)
Since it is a tail call, we let 'f' update eax and just let that sit
there for this function too.
------------------------------------------------------------
Also note that we need to deal with compiling functions. These cases
handle compiling the body but we need to do a little setup, namely
moving the argument registers into the variables that name the
function parameters. Eg,
(:label (x y z) e) -->
`(,x <- ecx)
`(,y <- edx)
`(,z <- eax)
... compilation of e goes here ...