I’ve been thinking about optimizing functional programs recently, for a project that I’m hoping to make my research topic in the near future. You probably already know about The Myth of the Sufficiently Smart Compiler. It basically says that the advanced compiler that optimizes your high-level, highly-abstracted programs to efficient low-level code, is basically a myth.
This post is a brain dump on sufficiently smart compilation of functional programs and some compilation techniques. I’ll first make some seemingly-unrelated points, and then hopefully use them to argue that the sufficiently smart compiler is not a myth, it just needs some hard work to be realized.
Every once in a while I see some blog posts about optimizing a JIT-compiled program by inspecting JIT trace dumps and generated code carefully, and I find this horrible, for the following reasons:
It couples your program design with the JIT compiler’s internals. From a software engineering point of view, I think this is really one of the worst things that can happen to a software. You end up structuring your code with the compiler’s convenience in mind. But compilers can’t make sense of high-level, abstracted code (remember the myth?). So you end up with a code that’s low-level, hard to read, understand and maintain. And what happens when a new version of the compiler is released?
JIT compilers are highly complex, and as a result they’re very hard to reason about and this complex design makes them unpredictable. A seemingly-unrelated change in your program can make the traces go significantly bad, and result in less optimized code, because maybe the change somehow made it to the trace and now you need to refactor your code.
If you need performance that bad, and you’re willing to read traces and generated assembly output for that, you could probably just write in a language that makes low-level optimizations easy/possible1. Or at least write performance-critical parts in a low-level language. Both of these cases eliminate the need for a JIT compiler.
I think the last point is worth discussing further. Most JIT compilers we use nowadays are for compiling dynamic languages2. By their nature dynamic languages are hard to optimize in compile time, so they rely on runtime knowledge for optimizations. But does that make JIT compilers useless for statically-typed languages that are more amenable to compile-time optimizations? I don’t have a good answer to this, probably because I’m not a JIT expert. I think the fact that HotSpot is doing good job is not an answer to this, because in JVM there’s bytecode interpretation going on, and this is adding some room for runtime optimizations. Namely, you have one level of indirection that you can eliminate using JIT compilation.
In other statically-typed, compiled languages like C++, Haskell, OCaml etc. there’s less room for that kind of optimizations. I think applicability of JIT compilation techniques to these type of languages would make an interesting topic for a research project.
High-level languages and abstractions make efficient execution of programs harder, but there are a couple of things that they can do to help with the compilation. Namely, you can guide the compiler to optimize your domain-specific code.
One nice and simple example is rewrite rules of GHC. They’re used quite heavily in base (GHC’s standard library) to eliminate intermediate lists. Other libraries use the same mechanism to tell the compiler how to optimize the code that uses their abstractions3.
But for a compiler to support this kind of program transformations the language has to have some properties. In our case, we should be able to reason about the code in compile time, and locally, i.e. without thinking about runtime execution environment (heap, stack, variables in scope etc.) and the interaction of our code with the rest of the code. This is possible in purely functional languages because they make equational reasoning possible.
This is a very powerful property. This makes it possible to see programs as terms in an algebra, and we can freely manipulate these terms according to our rules. In the most basic sense, these rules can be the rules that define our language’s operational semantics, because by its very definition these rules are guaranteed to preserve semantics of programs. But we can go even further by adding rewrite rules to these rules. Rewrite rules are a way to say, “trust me, this transformation preserves semantics” and at that point a compiler is free to use these rules.
This type of “algebraic manipulation of programs” is a very powerful concept, and it can do great things. A very good example is this 1997 paper about optimizing Haskell. Most (maybe all?) of the transformations described in that paper are still in use.
You probably wouldn’t want a compiler that compiles your programs to programs that do different things. We expect it to preserve the semantics. But that rule is sometimes too strict, and prevents some optimizations.
For example, if floating points and operations on floating points in your language are defined as they’re defined in IEEE-754, then the compiler can’t assume associativity of floating point operations and you lose some optimization opportunities. GCC’s
-ffast-math is for relaxing this restriction by letting the compiler assume this associativity.
Another example is termination properties of programs. For example, would you be OK with this transformation in a purely functional language:
(λx . 1) loop ~> 1
In a call-by-name (or call-by-need, which is an efficient implementation of call-by-name) language, this is a valid transformation. But in call-by-value language this would change the semantics. Previously this program was looping, but now it returns 1.
This example is actually a good demonstration of a problem that we have even in purely functional languages. Namely, there are some programs that don’t map to any values in the domain you use to model your language5. The way these programs are modeled are generally by defining a special value, ⊥ (read “bottom”). Non-terminating and exception/error throwing programs are said to be “bottom” and denoted with this value. Bottom values are said to be “less defined” than non-bottom values.
Using this definition, we can say that the transformation shown above transforms a program to a more defined one. You might want this restriction of preserving definedness of programs for different reasons, and here’s an example reason: Without this restriction, your program may terminate or loop depending on how the compiler performed. A seemingly-unrelated change in your program may cause a different termination behavior.
Now this is a hard problem. There are papers about transforming call-by-value functional languages while preserving termination properties (see this as an example). In general, we can’t decide if a program is bottom or not. First of all, that would be solving the halting problem. But more specifically, we can’t do this transformation if
y depends on a dynamic input here:
(λx . 1) (1 / y) ~> 1
In most cases though, the compiler is simply not able to propagate enough information to this stage to see if
y can be
0 or not6, even if all the necessary information is available in compile time.
There’s an old yet IMHO under-appreciated technique for taking statically known inputs into account while compiling programs. It’s called “partial evaluation” and described in details in this awesome book “Partial Evaluation and Automatic Program Generation” by Neil D. Jones, Carsten K. Gomard and Peter Sestoft. One very interesting but somewhat esoteric application of this idea is Futamura projections7, but to give a easier to understand example, a C partial evaluator could read your Vim config in compile time and compile Vim to an executable that doesn’t read any Vim files on startup because it’s already specialized to the Vim config it read in compile time8. General tools may depend of lots of dynamic input, but in your special case you may fix some of these variables and this is where a partial evaluation comes into the play. See this blog post for another example.
How much further could it propagate this statically known input and specialize rest the code using it? That’s completely different story and comes with some very hard to solve problems. I’ll again come to this later.
The whole point is to generate specialized code for known input. We can shift the stage a little bit and apply this idea in runtime, and that gives us multi-stage programming.
MSP allows us to generate code in runtime, link it to the program in a way that the generated code runs in the current execution environment (i.e. the generated code can refer to names in enclosing scope, pretty much like how closures would do).
Traditionally, MSP doesn’t allow code generation in compile-time, and the techniques used for code generation are completely different9. But we can generate code specialized to input that is only available in runtime. For example, you can write a game that runs code specialized to the player’s options. Or run a web server that does some optimizations on request dispatch code depending on some analysis on recent requests.
This is again a very powerful concept, and only recently I started to appreciate its potential10. IMO, MSP is missing a “killer language”11 (and also a “killer application” but I think that follows the language) and I’m hoping to make some progress on this front in the future.
This post may seem to be going nowhere, so let’s back up a bit and come to the point.
I define a sufficiently smart compiler not as a completely automated program, but as a toolchain. This toolchain has a completely automated compiler, but it also gives programmers tools for runtime code generation, and for teaching the compiler domain-specific optimizations. The compiler knows about the language’s semantics, and when possible it does reductions in compile time to remove abstractions and leave less work to runtime.
While doing reductions in compile time, it takes programmers’ rules into account, and optimizes abstractions accordingly. This allows it to optimize domain-specific abstractions that normally a compiler would have no way of knowing.
By now it should be clear that such a compiler is only possible with a language that allows these optimizations. For example, without a purely functional language, rewrite rules are not easy, if not impossible.
The compiler gradually compiles the language into languages that are more and more close to the machine language that it has to generate in the end. Reductions and user rules are applied in a level where programs are still expressed in a purely functional language. This language should be sufficient for most optimizations that eliminate programmers’ abstractions in compile time.
This way, programmers don’t need to look at ridiculous bytecode traces or instructions written in a highly-complex assembly language to figure out how things are optimized, and rather they stay in the same level of abstraction that their programs are written in. When they want to know about memory allocations, for example, they should be able to look at the next level in the compilation, which should have explicit memory allocation operations and pointers etc. The main point is that they stay in a level where they can observe some particular behavior (e.g. memory allocation) of a program and they don’t have to read assembly, for example, to see if their higher-order
map application that uses an increment function to increment integers in a list is compiled to a loop without any function calls.
In this compiler there’s no room for abstraction-breaking, unreliable optimizations or optimizations that cause coupling with the compiler’s internals, like in the case of JIT compilers.
In the beginning I said that I don’t see this as a myth. So how I think this is possible to implement? This is already a long-enough post, and I’ll stop for now. Let me just say that almost all of these things are implemented in different projects:
MSP does runtime code generation and MetaOCaml gives us a nice way to do that in a safe way. Another alternative is Terra, but in Terra generated code is in a different language, so that’s quite different (also, it’s a dynamically typed language that gives no guarantees about generated code).
Domain-specific optimizations are possible in Haskell thanks to GHC’s rewrite rules, as mentioned in the related section above.
GHC’s internal languages Core, STG and Cmm allow programmers to gradually go low level and see the details they’re looking for. Most of the time Core is enough to see if your abstractions are eliminated in compile time and if your rules worked as expected.
Some of these features are orthogonal to each other, like MSP and compile-time reductions. But some others are not, for example, we expect a supercompiler to take rewrite rules into account, otherwise it may be impossible to do some optimizations.
The hardest part seems to be compile-time reductions of programs according to operational semantics of the language, which involves some very hard problems, and one of the reasons has to do with preserving semantics. In the next couple of posts I’m hoping to talk about that, and in the meantime you can refer to chapter 9 of the PhD thesis I linked above.
One very good question to ask here is, what exactly gives us free theorems? I don’t have an answer to that question yet.↩
We’re assuming that it somehow knows that divide-by-zero leads to bottom.↩
In practice this is probably hard to achieve, and it certainly needs some refactoring in current Vim codebase.↩