osa1 github about atom

Memoized parsing in continuation-passing style

October 21, 2013 - Tagged as: en, parsing, cps.

Continuations are truly magical things. They’re the “ultimate abstractions of control flow”. Even without using any fancy language features like call/cc, you can have seriously cool and mind-boggling programs.

In “Memoization in Top-Down Parsing” paper, Mark Johnson builds up from memoizing top-down parsers and describes a way to handle left recursion in top-down parsers by combining memoization techniques with continuations.

I ported the code to Lua to experiment, you can see it here. Most interesting part is the memoized CPS parser generator from a normal CPS parser function:

function memo(parser)
    -- WARNING: this function is badly implemented in the sense that
    -- if you parser generated by this function on two different streams
    -- it will generate wrong results
    local tbl = {}
    return function (stream, idx, cont)
        if tbl[idx] == nil then
            tbl[idx] = { results = {}, conts = {} }
            table.insert(tbl[idx].conts, cont)
            parser(stream, idx, function (parse_result)
                -- check if same parse_result is already in the table
                local exists = false
                for _, result in ipairs(tbl[idx].results) do
                    if result == parse_result then -- TODO: this equality is probably wrong
                        exists = true
                        break
                    end
                end
 
                if not exists then
                    table.insert(tbl[idx].results, parse_result)
                    for _, cont in ipairs(tbl[idx].conts) do
                        cont(parse_result)
                    end
                end
            end)
        else
            table.insert(tbl[idx].conts, cont)
            for _, result in ipairs(tbl[idx].results) do
                cont(result)
            end
        end
        return tbl[idx]
    end
end

(btw, I found porting this code to a purely functional setting very hard thing to do. If you find a way to do this, please send me your code. Thanks.)

This piece of code didn’t make sense to me for a while. I think the key to understand this function is to find answer to this question:

How is this different from keeping a set of productions visited without consuming any input from input stream and when you come to the same production, just failing instead of trying to parse? Because trying to derive same production without consuming any input means you’ll end up with infinite loop.

This function different in that it accounts for parsing same production after following a different path of production. Think this CFG as an example:

T ::= T + T
    | int

In order to derive first production, it first needs to parse a T. But then it will be already noted that it was already trying to parse T, and add the continuation to the list of continuations to be called when a T at input position 1 is parsed.

While trying alternatives, it will parse an int, and derive T -> int at input position 1. And since it had saved the continuations to call when it successfully parse a T at location 1, it will call this continuations and thus parsing will continue.

I hope this helps other people to understand the trick.