2.0k
views3
comments

Providing meaningful parse errors with fsyacc

Hi,

I am using fsyacc to develop a parser and would like to provide some meaningful parse errors when the fsyacc generated parser doesn't like the input.

There seems to be a "parse_error" function that can be overriden. What state is available to the parse_error function to provide useful information to the user?

Is there a more general or better way to approach this?

Many thanks!

Once parse_error is called it's probably too late to generate useful error messages, but I'm not an expert and somebody else might know better.

In general error reporting/recovery is the weakest point of Yacc like parsers and it takes a lot of effort and experience to get it right. Some people go so far as to call it a "black art".

In principal you've got the following options for error handling:

detect errors during the lexer stage where possible
accept a superset of the grammar and handle errors in the parser actions or at a later point
introduce productions into the grammar with the special error token

One problem with the last approach is that error handling is not separated from the rest of the grammar, instead you need to extend your grammar with special rules. As a result your otherwise "perfect" grammar might become ambiguous because of the error rules and you could be forced to restructure your grammar.*

Another problem is that it's hard to understand and foresee what exactly the parser will do once it shifted the error token in a particular production (see the link above for a description of the procedure). Furthermore, due to the special handling of the error token nested error handling strategies will generally not work (or at least I haven't figured out how to make them work). For example, you can't easily implement something like "on an error at this point try this and if this still doesn't work try that and if that doesn't work hand over the problem to the error handler one hierarchy level above". I sometimes wonder if the more limited error token in Happy wouldn't allow for better error recover strategies.

Parser combinators generally have better built-in error reporting capabilities and are easier to extend with error recovery rules, but there are no respective libraries available for F#, yet. As some new language features in F# are practically begging for being applied in a monadic parser framework (see here and here) we can hope that such a library will spring into existence sooner or later...

Stephan

*This is complicated by the fact that fsyacc does not recognize the error token in precedence rules and generally does not allow for resolving reduce/reduce rules via precedence, which both are incompatibilities with OCaml.

By Stephan on 10/18/2007 2:48 AM (permalink)

Stephan,

Thanks for the insightful comments. Perhaps <i>meaningful<i> is too lofty of a goal. I would be happy to know and be able to print a couple of things before letting the parser die. If the parser could go on that would be better--but for now just knowing the following would be great: <UL><li>The line number</LI><li>The column number</LI><li>The token that caused the parse error</LI></UL> Is there some way to access the parserState to reveal these items? I'm sure that a combination of my newness to F# and Yacc is to blame for my ignorance here. Many thanks, remlap

By remlap on 10/18/2007 10:18 AM (permalink)

If you want to keep track of the last token retrieved before the error you could wrap the token function of your lexer with a function that stores a copy of the last token in a mutable field. With a lexer module named "Lex" which has the main lexer rule "token" and a parser module "Pars" you could for example do the following:

let main () =
let last_token = ref Pars.END // Pars.END or any other valid token of your lexer

let log_token lb =
let t = Lex.token lb
last_token:= t
t

let lexbuf = ... // set up your lexbuf

try
Pars.start log_token lexbuf
with e ->
let pos = lexbuf.EndPos
printf "error near line %d, character %d\n%s\n" pos.pos_lnum (pos.pos_cnum - pos.pos_bol) (e.ToString())
printf "last loken: "
print_any !last_token
printf "\n"
exit 1

By Stephan on 10/18/2007 12:29 PM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags