Homework 2: Parser

Due Tuesday, 5 February 2008, 6 PM

Homework 2 is to build a parser for Appel's Tiger language using ML-Yacc. The Tiger language is defined in Appendix A of the course textbook. The ML-Yacc user's manual is available online, and also (in less detail) in chapters three and four of the textbook.

Your parser should produce an abstract syntax tree using the datatypes defined in the Absyn structure defined in the file $TIGER/chap4/absyn.sml.

Skeleton files to get you started are available in the $TIGER/chap3/ and $TIGER/chap4/ directories, which you can find at Appel's textbook homepage. In particular, these files will be useful:

tiger.grm: Skeleton ML-Yacc file to get you started
parsetest.sml: Test driver
parse.sml: Top-level test driver code
absyn.sml: AST datatypes
symbol.sml: Support library providing symbols
table.sml, table.sig: Support library providing tables
prabsyn.sml: Useful library for printing ASTs

Remember to update your sources.cm file. Be certain to add ml-yacc-lib.cm to it if you haven't already, in addition to any other new files you are using. You'll want to remove tokens.sml from it, because a tokens file will be automatically generated by ML-Yacc now. Also, you'll need to update some header information in your lex file—see the end of Chapter 3 for details.

You should submit:

A tiger.grm file, the ML-Yacc source for your parser.
Any other source files you wrote to support your parser.
A text file describing:
- The members of your team,
- How you handled each shift-reduce conflict, and why you think it is correct, and
- Anything you think is of interest about your parser.

Anything subtle about your grammar, such as ambiguity resolutions of various kinds, should be clearly commented. You knew that, of course.

Your parser should use the same error-reporting machinery you employ in your lexer.

Some advice:

Make no attempt to write the ML code for the semantic actions that construct the AST when you start. Your first task should be to simply write the grammar to parse the concrete syntax; just have each rule produce the unit value ().
Once your grammar is able to parse all of the test files in $TIGER/testcases, then add semantic actions to the grammar to construct the AST while parsing tiger source code.
I estimate writing the grammar was about 2/3 of the time I spent writing the parser; adding the semantic actions was about 1/3 of the work.
You may find it helpful to write tiny toy grammars to explore ideas.
If you don't understand how shift/reduce parsers work, and can't figure out the table information left in the tiger.grm.desc file by ML-Yacc, you are going to be in a world of pain.
Be warned that there are some grammatically tricky bits to the grammar that will require some thinking. It may help to think more and hack less.
Is your LR parser efficiently parsing lists of arbitrary length (e.g., declarations, or expression sequences)?
Did you report the locations of syntax errors properly?

See the course text for more information, in particular Chapters 3 and 4 and Appendix A.

Good luck and have fun.

Last updated 18 February 2008.