osa1 github about atom

Exploring parsing APIs: the cost of recursion

November 29, 2024 - Tagged as: en, parsing, rust.

In the first post of this series we looked at a few different ways of parsing a simple JSON-like language. In the second post we implemented a few lexers, and looked at the performance when the parsers from the first post are combined with the lexers in the second post.

One of the surprising results in these posts is that our recursive descent parser, which parses the input directly to an AST, and in some sense is the simplest possible implementation when we need an AST, actually performs the worst.

(The implementations that collect tokens in a vector before parsing perform worse than recursive descent parsing, but those implementations have other issues as well, and can’t be used in many cases. Maybe I should’ve omitted them entirely.)

To keep things simple, let’s consider these three benchmarks from the last post:

To recap, the “iter” variants are Iterators that return one parse (or lexing) event at a time. The “push” variants take a “listener” argument with callbacks for events. See the first post for details.

In the “iter” benchmark, the code that generates the AST iterates an event parser:

fn event_to_tree<I: Iterator<Item = Result<ParseEvent, ParseError>>>(
    parser: &mut I,
    input: &str,
) -> Result<Json, ParseError> {
  ...
}

And the event parser iterates a lexer:

fn parse_events_iter_using_lexer_iter<I: Iterator<Item = Result<(usize, Token), usize>>>(
    lexer: I,
    input_size: usize,
) -> EventParser<I> {
    ...
}

When the AST generator asks for the next parse event, the parse event generator asks for the next token (maybe multiple times) and returns an event. AST generator consumes all of the events and builds the AST.

In the “push” benchmark, we have a “listener” that handles parse events and builds up an AST:

struct AstBuilderListener<'a> {
    input: &'a str,
    container_stack: Vec<Container>,
    current_container: Option<Container>,
    parsed_object: Option<Json>,
    error: Option<ParseError>,
}

impl<'a> EventListener for AstBuilderListener<'a> {
    ...
}

And another listener that handles tokens:

struct LexerEventListenerImpl<'a, L: EventListener> {
    listener: &'a mut L,
    container_stack: Vec<Container>,
    state: ParserState,
}

impl<'a, L: EventListener> LexerEventListener for LexerEventListenerImpl<'a, L> {
  ...
}

This implementation is driven by the lexer which “pushes” the tokens to the event parser, which (sometimes after handling multiple tokens) “pushes” parse events to the AST builder.

Both of these setups are considerably more complicated than recursive descent, yet they perform better. How?

When we consider what the recursive descent parser does that these don’t, it’s kind of obvious. It’s even in the name: recursion.

Our lexers and event parsers all optimize really well: there is no heap allocation anywhere, the code that “pushes” events are all monomorphised based on the handler type, so the handler calls are direct calls and can be (and probably) inlined. There’s also no recursion anywhere.

The recursive descent parser is basically one function that recursively calls itself for nested objects. It turns out this recursion has a cost. When I eliminate the recursion with some more state:

enum ParserState {
    /// Parse any kind of object, update state based on the current container.
    TopLevel,

    /// Parsing a container, parse another element on ',', or finish the
    /// container on ']' or '}'.
    ExpectComma,

    /// Parsing an object, parse a key.
    ObjectExpectKeyValue,

    /// Parsing an object, parse a key, or terminate the object.
    ObjectExpectKeyValueTerminate,

    /// Parsing an object and we've just parsed a key, expect ':'.
    ObjectExpectColon,
}

fn parse_single(iter: &mut Peekable<CharIndices>, input: &str) -> Result<Json, ParseError> {
    let mut container_stack: Vec<Container> = vec![];
    let mut state = ParserState::TopLevel;

    loop {
      ...
    }
}

It performs better than the recursive descent parser and the iterator based parser, and on par with the “push” based parser: (numbers are slightly different than above as I rerun them together)

I’m not quite sure what about recursion that makes the recursive descent parser perform so much worse, but my guess is that it makes the control flow more complicated to analyze, and in runtime, you have to move things around (in registers and stack locations) based on calling conventions. When moving between registers and stack locations you do memory reads and writes. My guess is that when combined, these cost something.

Other considerations with recursion

If you checkout the git repo and run the tests with cargo test, you will see that a test fails with a stack overflow.

This is something else to keep in mind when parsing recursively. Stack overflows are a real issue with recursive parsing, and I know some libraries that are explicit about it.

In practice though, I’m not sure if this can be the main reason to avoid recursive parsing. Recursion can happen in other places as well, and in a server application you would probably monitor runtime, memory consumption, and maybe even other resources of a handler, and have some kind of error handler that handles everything else.

Some higher level languages like Haskell and Dart make stack overflows exceptions/errors that can be caught and handled, so they can be handled as a part of “unexpected” crashes easily. In Rust, stack overflows can be handled at thread boundaries.

If the application is a command line tool or a compiler, where the input is provided by the user and handled on the user’s computer, it’s less of a problem and you can probably just let the application crash.

So I don’t think we can say that recursion should be avoided at all costs when parsing.

References

As usual, the code is available: github.com/osa1/how-to-parse-3.

To work around the stack overflow when testing, test in release mode: cargo test --release.

If you want to profile the code and understand more about why one version is faster than the other, I added 4 executables to the package, one for each benchmark listed above. You can generate a 100M input and run the parsers individually with:

$ cargo build --release
...

$ ./target/release/test_gen 100000000 > input

$ time ./target/release/parse_non_recursive input
./target/release/parse_non_recursive input  0.64s user 0.22s system 99% cpu 0.854 total