I recently realized that it may be possible to handle left-recursion in Parsec style parser combinator libraries. I quickly wrote a simple prototype implementation that demonstrates the idea.
The idea is to keep track of parser functions that are called without consuming any tokens from input stream. You should be able to run failure procedures when same parser function is encountered more than one time without consuming any tokens. When a token is consumed, state that is used to keep track of parser functions should be reset.
One assumption here is that I’m assuming parser functions do not alter any states. Otherwise when you come across a same parser, you can have different state and parser may behave differently.
Implementing the idea is easy even without having any extra support from Parsec. Here’s a demonstration on my favorite ambiguous grammar:
-- Exp ::= Int -- | Exp `+` Exp data Exp = Int Int | Add Exp Exp deriving (Show)
Parsers for non-terminals are easy:
plus = char '+' >> spaces int = fmap (Int . read) $ many1 digit <* spaces
I’m using a set to keep track of already visited parsers:
type MarksSeen = S.Set Int type RecParser s m a = ParsecT s MarksSeen m a
Auxiliary functions to alter the state that keeps track of visited parsers:
putMark i = do is <- getState if S.member i is then fail "recursion" else putState $ S.insert i is resetMarks = putState S.empty
Now the interesting part,
add parser marks itself as first thing to do, and calls
exp parser. Since
exp parser is entry point, this means an indirect recursive call. When same
putMark call is made, Parsec runs failure actions instead of going into infinte loop:
add = do putMark 1 e1 <- exp resetMarks plus e2 <- exp spaces return $ Add e1 e2 exp = choice [try add, int]
resetMarks call is also important,
exp has to consume some tokens, so after parsing
e1, I’m calling
Here’s an example call of this parser:
ghci> runParser exp S.empty "" "1 + 2 + 3 + 4" Right (Add (Int 1) (Add (Int 2) (Add (Int 3) (Int 4))))
You can observe that parser gets into an infinite loop when marks are removed. Here’s an example demonstrating the error message:
rec = putMark 0 >> rec >> resetMarks ghci> runParser rec S.empty "" "" Left (line 1, column 1): recursion
One problem with this approach is that it requires more typing, and you should be careful too. Marks can be placed using TemplateHaskell to ensure unique numbers are given to each
putMark call. As a second improvement, I think with some modifications on Parsec we can make Parsec to reset marks when a token is consumed(using consumed-ok continuation of
You can see the complete program here.
Removing left-recursions in your grammar may not be a huge problem – except when you’re working on functional languages with ML-like syntax. Then you’re out of luck because being functional means you’re
Exp non-terminal contains several dozen of productions and function applications is a part of that too, and it’s left-recursive:
Exp ::= ... | Exp Exp_1 ... Exp_N [left-associative] ... a hundred more productions ...
Whenever I need to write a parser for a grammar like this, I’m thinking for an easier way to parse it. I still couldn’t come up with a solution. Idea I just explained does not solve it, because it parses right-associatively. There is one workaround, but I’m not sure if that results with a parser for same grammar:
app = do putMark 2 fn <- exp resetMarks putMark 2 as <- many1 exp resetMarks return $ foldl App fn as ghci> runParser pgm S.empty "" "1 2 3 4 + 5 + 6 7" Right (Add (App (App (App (Int 1) (Int 2)) (Int 3)) (Int 4)) (Add (Int 5) (App (Int 6) (Int 7)))) ghci> runParser pgm S.empty "" "1 + 2 + 3 4" Right (Add (Int 1) (Add (Int 2) (App (Int 3) (Int 4))))