Issues in compilation of L1. - generating a simple instructions: edx += ecx => addl %ecx, %edx need to convert "s"s in to assembly. Rules: - If it is a constant, prefix it with a $ - If it is a label, prefix it with a $ as well; assembler inserts actual location - If it is a register, prefix it with a % => special case: for the argument to jump, we don't need the $ on a label The arithmetic left and right shifts have to use the "small" registers. Ie, (ebx <<= eax) turns into the code sall %al, %ebx This means that if there is a value larger than 255 in %eax, only the lower 8 bits are even considered in the shift. So, shifting by 11 is the same as shifting by 1024+11 (1035). To aid in debugging, the interpreter signals an error if those higher bits are not zeros. - runtime system The runtime system is implemented in C. So when you see calls to print or allocate, the code generator needs to produce calls to the C-implemented functions. => printing void print_content(void** in, int depth) { if (depth >= 4) { printf("..."); return; } int x = (int) in; if (x&1) { printf("%i",x>>1); } else { int size= *((int*)in); void** data = in+1; int i; printf("{s:%i", size); for (i=0;i layout of records #define HEAP_SIZE 1048576 // one megabyte void** heap; void** allocptr; int words_allocated=0; void* allocate(int fw_size, void *fw_fill) { int size = fw_size >> 1; void** ret = (void**)allocptr; int i; if (!(fw_size&1)) { printf("allocate called with size input that was not an encoded integer, %i\n", fw_size); } if (size < 0) { printf("allocate called with size of %i\n",size); exit(-1); } allocptr+=(size+1); words_allocated+=(size+1); if (words_allocated < HEAP_SIZE) { *((int*)ret)=size; void** data = ret+1; for (i=0;i reporting array dereference errors: int print_error(int* array, int fw_x) { printf("attempted to use position %i in an array that only has %i positions\n", fw_x>>1, *array); exit(-1); } int main() { heap=(void*)malloc(HEAP_SIZE*sizeof(void*)); if (!heap) { printf("malloc failed\n"); exit(-1); } allocptr=heap; go(); // call into the generated code return 0; } - dealing with comparison operators cmp instruction is a shorthand for subtraction, so need a register destination for one of the arguments. This means we need to reverse the arguments sometimes. Lets consider this: (cjump eax < ebx :true :false) => cmpl %ebx, %eax // note reversal of argument order here. jl _true jmp _false As it turns out, cmpl is implemented via the subl instruction. Quite literally. So consider this one: (cjump 11 < ebx :true :false) => cmpl %ebx, $11 // ERROR! jl _true jmp _false this is because cmpl is the same as subl so the second argument has to be a destination. Kind of wacky, but we can deal, right? Just generate this code: cmpl $11, %ebx jg _true jmp _false ie, reverse the arguments and reverse the comparison. How about this one? (cjump 11 < 12 :true :false) Figure it out at compile time and generate just a "jmp"! (eax <- ebx < ecx) Here we need another trick; the x86 instruction set only let us update the lowest 8 bits with the result of a condition code. So, we do that, and then fill out the rest of the bits with zeros with a separate instruction: cmp %ecx, %ebx setl %al movzbl %al, %eax Here are the correspondances between the cx registers and the places where the condition codes can be stored: %eax's lowest 8 bits are %al %ecx's lowest 8 bits are %cl %edx's lowest 8 bits are %dl %ebx's lowest 8 bits are %bl - the C function calling convention: => not too difficult to deal with because it only shows up in restricted ways. Specifically, you have to generate a function that gets called for the main portion of the program and you have to call into the runtime system. Main prefix: # save caller's base pointer pushl %ebp movl %esp, %ebp # save callee-saved registers # (hope body preserves stack # pointer) pushl %ebx pushl %esi pushl %edi pushl %ebp # body begins with base and # stack pointers equal movl %esp, %ebp Main suffix: # restore callee-saved registers popl %ebp popl %edi popl %esi popl %ebx # restore caller's base pointer leave ret To do a call into the runtime you just push the args and do a call. pushl pushl call allocate // defined in C addl $8,%esp pushl call print // defined in C addl $4,%esp - our function calling convention => Call site: - sets up registers for the arguments (at most 3). - invokes the "call" L1 instruction, which: - pushes the return address, - pushes the current ebp onto the stack. - sets the ebp to the current esp - does a goto. (call s) turns into this, where is a freshly created label pushl $ pushl %ebp movl %esp, %ebp jmp // the converted version of s : Note that if s is a label, then just put the label there; otherwise, convert it following the rules for the "s"s at the top of these notes. => Beginning of the function body: - moves the esp to make room for the local stack variables (ie, the space between the ebp and esp). => Regular return for the function: - puts the result into eax. - invokes the L1 "return" instruction, which: - sets the esp back to the ebp (free the local function storage) - pop / restore the ebp - pop return address & goto (x86 "ret" instruction). In assembly, that's this sequence of instructions: movl %ebp, %esp popl %ebp ret => tail call site: - sets up registers for the arguments (at most 3). - invokes the "tail-call" L1 instruction, which: - sets the esp to the current ebp - does a goto. (tail-call s) turns into this, where is a freshly created label movl %ebp, %esp jmp // the converted version of s - generating a function: generate the label and then the body of the function. - putting it all together: => create runtime.c containing the above functions and any other helper functions you need. Compile it with gcc -c -O2 -o runtime.o runtime.c => generate the assembly code into a file called, say, prog.S. Use the following header for the file: .file "prog.c" .text .globl go .type go, @function and then this for the footer: .size go, .-go .ident "GCC: (Ubuntu 4.3.2-1ubuntu12) 4.3.2" .section .note.GNU-stack,"",@progbits I'm not completely clear what the annotations mean-- I used gcc -s to get an example assembly file and cannibalized it. If you know what it means, or even where to find a relevant manual, do let me know. => Then, once you have that assembly code, use as to turn that into a .o file: as -o prog.o prog.S => and then put the two .o files together into a binary: gcc -o a.out prog.o runtime.o