Once parse_error is called it's probably too late to generate useful error messages, but I'm not an expert and somebody else might know better.

In general error reporting/recovery is the weakest point of Yacc like parsers and it takes a lot of effort and experience to get it right. Some people go so far as to call it a "black art".

In principal you've got the following options for error handling:

  • detect errors during the lexer stage where possible
  • accept a superset of the grammar and handle errors in the parser actions or at a later point
  • introduce productions into the grammar with the special error token

One problem with the last approach is that error handling is not separated from the rest of the grammar, instead you need to extend your grammar with special rules. As a result your otherwise "perfect" grammar might become ambiguous because of the error rules and you could be forced to restructure your grammar.*

Another problem is that it's hard to understand and foresee what exactly the parser will do once it shifted the error token in a particular production (see the link above for a description of the procedure). Furthermore, due to the special handling of the error token nested error handling strategies will generally not work (or at least I haven't figured out how to make them work). For example, you can't easily implement something like "on an error at this point try this and if this still doesn't work try that and if that doesn't work hand over the problem to the error handler one hierarchy level above". I sometimes wonder if the more limited error token in Happy wouldn't allow for better error recover strategies.

Parser combinators generally have better built-in error reporting capabilities and are easier to extend with error recovery rules, but there are no respective libraries available for F#, yet. As some new language features in F# are practically begging for being applied in a monadic parser framework (see here and here) we can hope that such a library will spring into existence sooner or later...

Stephan

*This is complicated by the fact that fsyacc does not recognize the error token in precedence rules and generally does not allow for resolving reduce/reduce rules via precedence, which both are incompatibilities with OCaml.

By on 10/18/2007 2:48 AM ()

Stephan,

Thanks for the insightful comments. Perhaps <i>meaningful<i> is too lofty of a goal. I would be happy to know and be able to print a couple of things before letting the parser die. If the parser could go on that would be better--but for now just knowing the following would be great: <UL><li>The line number</LI><li>The column number</LI><li>The token that caused the parse error</LI></UL> Is there some way to access the parserState to reveal these items? I'm sure that a combination of my newness to F# and Yacc is to blame for my ignorance here. Many thanks, remlap

By on 10/18/2007 10:18 AM ()

If you want to keep track of the last token retrieved before the error you could wrap the token function of your lexer with a function that stores a copy of the last token in a mutable field. With a lexer module named "Lex" which has the main lexer rule "token" and a parser module "Pars" you could for example do the following:

let main () =
let last_token = ref Pars.END // Pars.END or any other valid token of your lexer

let log_token lb =
let t = Lex.token lb
last_token:= t
t

let lexbuf = ... // set up your lexbuf

try
Pars.start log_token lexbuf
with e ->
let pos = lexbuf.EndPos
printf "error near line %d, character %d\n%s\n" pos.pos_lnum (pos.pos_cnum - pos.pos_bol) (e.ToString())
printf "last loken: "
print_any !last_token
printf "\n"
exit 1

By on 10/18/2007 12:29 PM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper