osa1 github gitlab twitter cv rss

The issue of splitting without work duplication

August 13, 2015 - Tagged as: en, supercompilation, haskell.

(I’m starting publishing my long list of unpublished blog posts with this post)

(Examples are from Bolingbroke’s PhD thesis)

Example 1:

let a  = id y
    id = \x -> x
 in Just a

Problem: The compiler should know about id while compiling a. This is easy to do, just tell the compiler about every binding when compiling RHSs. However, it causes some other problems:

Example 2:

let n = fib 100
    b = n + 1
    c = n + 2
 in (b, c)

Problem: If we tell about n to the compiler when it’s compiling b and c, we’re taking the risk of work duplication. It may seem like fib 100 will be evaluated in compile time and so duplication is not a huge deal, but this is not necessarily the case. First, we can’t know if it’s going to be evaluated to a value in compile time. Second, even if it’s a closed term and we somehow know it’s going to be terminated, termination checker of the evaluator may want to stop it before it’s evaluated to a value. Third, most of the time it’ll be an open term that’ll get stuck in the middle of supercompilation.

And when that happens we will generate a let-binding in residual code. In our case, we’ll be generating two let-bindings, one is for b and one is for c, and those let bindings will be doing same work.

Question: Can we rely on a post-processsing pass to eliminate common subexpressions? I.e. if we generate a code like this:

let b = let n_supercompiled = <supercompiled fib 100>
         in n_supercompiled + 1
    c = let n_supercompiled = <supercompiled fib 100>
         in n_supercompiled + 2
 in (b, c)

It would transform it to obvious residual code that has single n_supercompiled which is in scope of b and c.

What are trade-offs?

Finding a good heuristic is hard. Let’s say we try to estimate costs of expressions and decide whether to tell the compiler about them or not. If we decide that ys and xs are expensive in this case:

let map = ...
    ys = map f zs
    xs = map g ys
 in Just xs

We miss a deforestation opportunity, because the compiler won’t know about ys while compiling xs.