August 13, 2015 - Tagged as: en, supercompilation, haskell.
(I’m starting publishing my long list of unpublished blog posts with this post)
(Examples are from Bolingbroke’s PhD thesis)
Example 1:
let a = id y
id = \x -> x
in Just a
Problem: The compiler should know about id
while compiling a
. This is easy to do, just tell the compiler about every binding when compiling RHSs. However, it causes some other problems:
Example 2:
let n = fib 100
= n + 1
b = n + 2
c in (b, c)
Problem: If we tell about n
to the compiler when it’s compiling b
and c
, we’re taking the risk of work duplication. It may seem like fib 100
will be evaluated in compile time and so duplication is not a huge deal, but this is not necessarily the case. First, we can’t know if it’s going to be evaluated to a value in compile time. Second, even if it’s a closed term and we somehow know it’s going to be terminated, termination checker of the evaluator may want to stop it before it’s evaluated to a value. Third, most of the time it’ll be an open term that’ll get stuck in the middle of supercompilation.
And when that happens we will generate a let-binding in residual code. In our case, we’ll be generating two let-bindings, one is for b
and one is for c
, and those let bindings will be doing same work.
Question: Can we rely on a post-processsing pass to eliminate common subexpressions? I.e. if we generate a code like this:
let b = let n_supercompiled = <supercompiled fib 100>
in n_supercompiled + 1
= let n_supercompiled = <supercompiled fib 100>
c in n_supercompiled + 2
in (b, c)
It would transform it to obvious residual code that has single n_supercompiled
which is in scope of b
and c
.
What are trade-offs?
Finding a good heuristic is hard. Let’s say we try to estimate costs of expressions and decide whether to tell the compiler about them or not. If we decide that ys
and xs
are expensive in this case:
let map = ...
= map f zs
ys = map g ys
xs in Just xs
We miss a deforestation opportunity, because the compiler won’t know about ys
while compiling xs
.