[I2 logo] [RWTH logo] MOVES: Software Modeling and Verification
(Informatik 2)
Computer Science / RWTH / I2 / Research / AG / FP / HaskellParser
Printer-friendly
Functional Programming Aachen University of Technology, Dept. of CS, Lehrstuhl für Informatik II, Research, FP

The Haskell Parser

Purpose

We developed a parser for Haskell for two reasons:
  • To evaluate the suitability of parser combinators for parsing a real programming language.

    Parser combinators are more flexible than parser generators (nicest of all: the parsing functions have the same structure as the syntax definition of the Haskell report). But can they be used for more than toy languages and first prototypes?

  • To have a parser for Haskell that can be used in various applications. E.g.: source code reformaters, literate programming systems, source code browsers, Haskell compilers.

Implementation

The first version of our Haskell parser has been written by the student Philipp van Hüllen with the parser combinators of Graham Hutton and Erik Meijer, including their extension for handling the offside rule.

The current version has the following main limitations:

  • syntax errors do not cause any error messages but lead to failure
  • it is much too slow
  • it is hardly tested
  • it is documented in German
However, there exists a lot of literature on parser combinators and I have some ideas of my own on how to attack the first two problems.

You may download the current version from here. Please tell us if you found it useful.

Some remarks about writing a Haskell parser

I don't consider it a good idea to have the parser directly construct the algebraic data structure of the abstract syntax tree. You never know the needs of users of the parser. Hence our parser constructs the abstract syntax tree by using functions, not data constructors. These functions are defined in another module. Currently in the only such module we have, all functions are defined as data constructors of the algebraic data types which are defined in a third module. For defining these algebraic data types we did use GHC's HsSyn as model (substituting our parser for GHC's would be a really good test). There is still the problem of which information should be passed to the functions that construct the syntax tree. Source location information is already collected because of the off-side rule. So every syntactic construct is told its start location. However, should it also know its end location? Some applications may require parsing comments as well.

There is also the problem of how to handle fixities of imported operators. GHC takes the approach of first parsing all operators as being left-associative with identical priority. After all imports are processed (and hence fixities are known), another compiler phase modifies the syntax tree accordingly. We currently take this approach as well. Laziness would actually allow the neat trick of calling the (complete) parser function with a table of fixities, which is calculated from the syntax tree which is produced by the parser. I just fear that this method may lead to an obscure order of error messages.


Olaf Chitil, chitil@i2.informatik.rwth-aachen.de, Last update: November 14, 1997
Valid HTML 4.01 Strict! Valid CSS!