osa1 github gitlab twitter cv rss

Staging is not just code generation

May 17, 2015 - Tagged as: en, multi-stage programming.

It feels weird to see that even Oleg seems to think about it that way.

We don’t have a definition of the term that’s supposed to be accepted by everyone – one can use it in different meanings. My minimal definition for the term is “a technique for runtime code generation and linking”. If it’s missing “linking” part, then to me it’s just another AST definition + printer library (sometimes it’s embedded into the language to add some convenience syntactic sugar and/or quasiquotation to the language).

To me the whole point is “runtime specialization”. For that you should be able to use the data available at code-generation time in the generated code1. This is called “cross-stage persistence”. In a simple multi-stage language, this may be supported simply by serializing the data as code, but this is not as flexible as one might need for runtime optimized code generation. For example, you can’t serialize a socket or file handle this way, but it’s safe and possible to use a socket or file handle available when generating the code in the generated code. You can’t easily do that if the staging library/language doesn’t provide this as a feature.

In the case of BER-MetaOCaml, I think this is one of the major limitations: It only supports OCaml bytecode, and cross-stage persistence is available via the serialization of runtime data. If a runtime data is not serializable, then you can’t easily use it in future stages.

One more thing about printing the code: In my opinion, a multi-stage language should provide a way to print generated code only for debugging purposes. (e.g. to check if the generated code is really the code I want)2

To make it clear: I think it’s fine to use it for code generation, but if all it can do is code generation then I think it’s missing the point.

As an example, I used staging for code generation in my last project, and it seems like Scala LMS people do this a lot too3.


  1. In a sense this is like a closure, generated code should be able to refer to names in enclosing environment of code generator.↩︎

  2. I’m wondering if Terra has a way to print generated code. Any ideas?↩︎

  3. I didn’t read the paper very carefully, but I think one example is Optimizing Data Structures in High-Level Programs paper which is published in POPL’13.↩︎