<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <title>osa1.net - All posts</title>
    <link href="http://osa1.net/rss.xml" rel="self" />
    <link href="http://osa1.net" />
    <id>http://osa1.net/rss.xml</id>
    <author>
        <name>Ömer Sinan Ağacan</name>
        <email>omeragacan@gmail.com</email>
    </author>
    <updated>2026-05-12T00:00:00Z</updated>
    <entry>
    <title>Macros in Fir</title>
    <link href="http://osa1.net/posts/2026-05-12-fir-macros.html" />
    <id>http://osa1.net/posts/2026-05-12-fir-macros.html</id>
    <published>2026-05-12T00:00:00Z</published>
    <updated>2026-05-12T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Fir macros are fully deterministic programs that are distributed separately and that can introspect into the using program’s type-checked AST definitions.</p>
<p><strong>Deterministic execution</strong> of macros is necessary to be able to</p>
<ul>
<li>Cache macro results in the language server and between compilation of a package during development.</li>
<li>Avoid the issues with build scripts in many languages that can run arbitrary code when compiling a package, causing all kinds of security nightmares.</li>
<li>Potentially distribute packages with macros fully expanded. This makes dependency trees smaller as importing a package doesn’t bring in its macro dependencies.</li>
<li>Avoid leaking compiling platform details into generated code, which makes cross-compilation difficult.</li>
</ul>
<p><strong>Separate distribution</strong> is not a requirement but it simplifies the implementation quite significantly, and it’s also a good idea from a software design point of view:</p>
<ul>
<li><p>With macros in the same package with the use sites, you have to compile some parts of the package with the macros to be able to run the macros or have an AST interpreter (or something in between: compile some parts to a bytecode), then run the macros, then compile the whole package again with the macro-generated code.</p>
<p>This is technically challenging for a few reasons:</p>
<ul>
<li>You have to partition the package into smaller parts that can be executed on their own.</li>
<li>The execution will be slow (AST interpretation), or complicated (package will be compiled several times, with different entry points)</li>
</ul>
<p>It also makes it easier to invalidate macro inputs (as they’ll depend on code in the same library that they’re used in), which will cause macros to run repeatedly during development.</p></li>
<li><p>Because of the determinism requirements, compilation of metaprograms won’t be the same as the compilation of executables. With metaprograms distributed separately this is easier to deal with.</p></li>
<li><p>From software design point of view, having metaprograms separate from the libraries using them is a good idea. Even without this restriction, I think most packages would have a separate <code>Macros</code> module with the metaprograms as it’d make it easier to navigate within the code base.</p></li>
</ul>
<p>It should be rare for a macro and a library to recursively depend on each other. In these rare cases, the library can be split into two smaller libraries: one that the macro uses, another one that uses the macro (and maybe also the first library).</p>
<p><strong>Introspection</strong> makes macros much more flexible and useful for many use cases. Without introspection macros take ASTs (or token trees, or any code as a string) as arguments and need to be passed the ASTs directly. E.g. in Rust, derive macros need to be added as an attribute to the definitions as they can’t otherwise go look up definition of a type that you pass to it as an identifier.</p>
<pre><code>#[derive(MyDeriveMacro)]
type Foo { ... }</code></pre>
<p>Here <code>MyDeriveMacro</code> is passed the token trees of the next item (the type definition). This has annoying limitations:</p>
<ul>
<li><p>Calling the macro in another module, or even in another place in the same module is not possible.</p>
<p>If I generate e.g. an <code>Hash</code> trait implementation and some serialization code for the same type, all those need to be added as attributes to the type. This causes large number of attributes in types instead of core trait implementations (<code>Hash</code>) and others (serialization, debugging helpers, …) defined in separate modules.</p></li>
<li><p>This is not an essential limitation of this approach but: because of how limited the <code>derive</code> syntax is, passing other arguments to the macro (other than the following item’s token tree) is not possible. Users then work around this by littering the fields of the type with more attributes as the macro will be able to see those attributes, so they can be used to customize macro expansion.</p></li>
<li><p>Because there’s no type information passed to these macros, code generation based on types is not possible.</p></li>
<li><p>You can’t generate multiple <code>impl</code>s for multiple types in one call as each call takes one type definition as input.</p></li>
</ul>
<p>In Fir macros, you can pass <em>explicitly</em> (this part is important) as many type and function identifiers as you want and the macro gets the full type-checked ASTs of the definitions of those types and functions. These type-checked ASTs also allow looking up definitions used by those types and functions, so you also get the dependencies of those types and functions.</p>
<p>Short intro to the syntax before moving on to examples: <code>$</code> indicates a macro call, <code>@</code> is the syntax for passing a definition (rather than token tree) to a macro. E.g. <code>@foo(@MyType, @myFunction, some [other, random = ("tokens")])</code> passes the full type-checked ASTs of <code>MyType</code> and <code>myFunction</code> to the macro, and an untyped token tree for the third argument.</p>
<p>A few examples of what introspection allows:</p>
<ul>
<li><p>I want my ASTs to be serializable. The AST type is large and uses many types, and some of the dependencies may not even be in the current package. When I pass the AST type to a Fir macro like <code>$deriveSerialize(@MyAstType)</code>, the macro sees all of the definitions of <code>MyAstType</code>, with their type-checked ASTs. So it can traverse all of the types used, and generate as many functions (or <code>impl</code>s) as it needs.</p>
<p>Compare this with Rust where we’d need to add <code>#[derive(Serialize)]</code> or similar above each of the types used. If a type is not defined in our package, we have to work around it somehow (e.g. maybe introduce a newtype and use it in the AST).</p>
<p>Once we added those attributes to each and every type that’s used by the AST (directly and transitively), if a type becomes unused as we refactor our AST, we’ll get no warnings about the redundant attribute and we’ll have <a href="https://users.rust-lang.org/t/tooling-to-find-and-remove-unused-derives/137013">no way of finding it</a>.</p></li>
<li><p>With introspection I can derive a trait for my type without having to derive the same trait for all of its dependencies. Imagine deriving <code>Eq</code> for a type like:</p>
<pre><code>type Foo(
    x: Type1,
    y: Type2,
)</code></pre>
<p>If all I have is the AST for <code>Foo</code>, all I can do in the <code>==</code> function is to compare <code>x</code> and <code>y</code> fields with <code>==</code>, which requires <code>x</code> and <code>y</code> to implement <code>Eq</code> as well.</p>
<p>When I have access to <code>Type1</code> and <code>Type2</code> definitions, I can do the same, or I can just compare <code>Type1</code> and <code>Type2</code> fields directly, or generate comparison functions for those types and call them.</p>
<p>A real use case for this is when I have <code>Eq</code> that’s structural equality as usual, and then I want another comparison function that ignores certain fields. In language frontends, this commonly happens when you compare two ASTs ignoring source locations. In Fir, I can do</p>
<pre><code>$deriveEq(@Foo, name = eqWithoutLocs, skip = [@Loc])</code></pre>
<p>Here <code>@Foo</code> passes the full type-checked AST of <code>Foo</code>. The second argument is just a token tree that the macro parses for customization options. In this example, it passes the name of the function being generated. The third argument is similarly a token tree, but it has the <code>Loc</code> type definition in the AST. This allows the macro to generate structural equality code that skips all <code>Loc</code>-typed fields.</p>
<p>I can then have another one that generates the same code as a Rust derive macro would generate:</p>
<pre><code>$deriveEq(@Foo)</code></pre>
<p>This one doesn’t have any customization options, and the macro by default generates the usual trait impls.</p></li>
</ul>
<p>Introspection opens up so many possibilities and solves many of the problems with purely syntactic macros (that take just a string, list of tokens, or ASTs as arguments).</p>
<h2 id="interaction-with-type-checking">Interaction with type checking</h2>
<p>Fir has been designed from day 1 for parallel type checking and compilation.</p>
<p>With macros that can be passed type-checked ASTs, type checking gets interleaved with macro expansion, but module-level parallelism is not affected. This is because macros can’t generate imports and generated code (same as hand-written code) can’t access definitions that are not imported. (e.g. Fir doesn’t have paths like Rust’s <code>crate::...</code> or <code>package::...</code>, you can use names with qualified paths, but the paths still need to be imported explicitly first)</p>
<p>So we process the modules the same way as before: starting from the main module (or public modules in a library) we create a dependency DAG of modules<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. This DAG can then be processed in parallel as before. Macros have no influence over the DAGs of modules.</p>
<p>Within a module though, things get a bit tricky. A macro call can only be expanded after the definitions it introspects into are fully type checked, but it also needs to be expanded before too late to be able to type check definitions that depend on macro-generated code.</p>
<p>To deal with the first part of the problem (determining macro dependencies), we require that definitions are passed to macros explicitly (with the <code>@&lt;identifier&gt;</code> syntax we used above). If a definition is not explicitly passed to a macro and its not a dependency of a definition that’s explicitly passed to it, the macro won’t have access to it.</p>
<p>However, determining macro outputs ahead of time is not possible. So to deal with the second part of the problem, we create a dependency DAG of module-level items. Macros are also a part of these DAGs and their dependencies are determined by the <code>@...</code>s in their arguments. When there’s an unbound name in a definition, that name is potentially generated by the macros in the module that don’t depend on the definition, so that creates dependencies from the definition with unbound names to those macros (that don’t depend on the definition) in the module. Macros can’t be in a recursive dependency group (SCC) with other macros or definitions, so in the DAG we require that each macro is in its own group.</p>
<p>When we process this DAG of type checking and macro expansion operations in topological order we type check macro dependencies before macro expansion, and expand macros before any potential dependencies on their expansions.<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
<p>Macro call locations are not important for this algorithm, as the definitions are not processed in source code order. You can put a macro call anywhere in a module and it works the same way.</p>
<p>Macro generated code is name-resolved as usual, and the name resolving process updates the DAG with the dependencies of the generated code. Consider:</p>
<pre><code>type Foo(...)

trait Trait1[t]:
    method(self: t)

$implTrait1(@Foo)   # generates `impl Trait1[Foo]: ...`</code></pre>
<p>In this program the order of type checking operations are:</p>
<ul>
<li><code>Foo</code> is checked first</li>
<li>Then the macro is expanded</li>
<li>Name resolving the macro expansion creates new dependency edge from the generated code to <code>Trait1</code></li>
</ul>
<p>So the macro expands before <code>Trait1</code>, but the generated code is checked after <code>Trait1</code>.</p>
<p>There are a few ways to change the macro expansion schedules in the example above:</p>
<ul>
<li><p>We can pass a reference to <code>Trait1</code> to check <code>Trait1</code> before the macro expansion: <code>$implTrait1(@Foo, @Trait1)</code></p></li>
<li><p>We can generate <code>impl</code> methods instead of the whole <code>impl</code>. So instead of<code>$implTrait1(@Foo)</code> which generates an <code>impl</code>, we do</p>
<pre><code>impl Trait1[Foo]:
    method(self: Foo):
        $genTrait1Method(@Foo)</code></pre></li>
</ul>
<p>Note that a macro expansion can update the DAG in arbitrary ways: new top-level definitions create nodes and references in the generated code create edges. They can also remove edges: remember that an unresolved name creates edges from the definition with the unresolved name to the macros in the module that doesn’t depend on the definition. Some of these names will be resolved as we expand macros, and the edges to other macro definitions from the definition with the unresolved name will be removed.</p>
<h2 id="implicit-dependencies">Implicit dependencies</h2>
<p>Method calls introduce implicit dependencies: in <code>x.f(arg1, arg2)</code>, there can be three functions that can be potentially called: (called candidates)</p>
<ol type="1">
<li>A top-level function <code>f</code>, with a first argument type that matches <code>x</code>’s. (UFCS)</li>
<li>A method <code>f</code> with a <code>self</code> type that matches <code>x</code>’s.</li>
<li>A trait method <code>f</code> whose <em>all</em> arguments match the types of arguments in the call site (<code>x</code>, <code>arg1</code>, <code>arg2</code>).</li>
</ol>
<p>So a method call to <code>f</code> creates dependencies to all top-level functions and methods with name <code>f</code>, and all traits with a method <code>f</code> and the trait’s impls.</p>
<h2 id="interaction-with-the-trait-environment">Interaction with the trait environment</h2>
<p>Macros can generate traits and impls, but they don’t have access to the trait environment and can’t introspect into traits and impls:</p>
<ul>
<li><p>The trait environment is per-module and it depends on the imports. For example, if I have a two-parameter trait <code>MyTrait</code> and implementation of <code>MyTrait[Foo, Bar]</code>, the trait environment changes when I import the trait and the two types, even if I don’t use the trait (or its methods) explicitly. If we give macros access to the trait environment they could potentially generate different code based on imports, which goes against the principle we’ve been following with the explicit macro dependencies and deterministic outputs.</p></li>
<li><p>Impls are not named, so they can’t be explicitly passed.</p></li>
<li><p>Traits are named, so they can be explicitly passed, but it’s a bit unclear to me how useful that would be. I couldn’t come up with a use case where a macro would want to look into a trait definition and generate code based on that.</p></li>
</ul>
<h2 id="deterministic-execution-of-macros">Deterministic execution of macros</h2>
<p>This is not enforced in the current prototype, but it will be in the final version.</p>
<p>Once the effect system is ready, we can require that a function needs to have no effects to be usable as a macro.</p>
<p>However in any kind of statically checked system there will always be escape hatches (for the system to be practically useful), so just compile-time/type-level enforcement won’t be enough, and we’ll need to sandbox the macro programs regardless of how/what we check in compile time.</p>
<p>One easy way here would be compiling them to Wasm and then making host calls for IO (and other things we don’t allow in macros) fail. This is easy to implement but it requires a Wasm engine to be embedded within the language front-end, and execution will be slower than a native executable as (1) Wasm will need to be interpreted or JIT compiled (2) a native library could be loaded dynamically and it can share the same address space, so we can share immutable references to type-checked ASTs with the macros instead of serialization and deserialization for ASTs as they’re passed to macros and the generated ASTs are returned to the language front-end.</p>
<p>The details here are to be determined.</p>
<h2 id="the-macro-api">The macro API</h2>
<p>In the prototype, macros are a part of the implementation and they use the internal data structures of the compiler.</p>
<p>One of the other goals with Fir <a href="https://osa1.net/posts/2025-09-04-fir-getting-useful.html">since the early days</a> is to have the language front-end available to users as libraries. To avoid creating yet another library/API when we already have the language front-end available, the macros will probably use the language’s official AST library.<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a></p>
<p>To avoid passing large ASTs to macros when a macro only needs the main type being passed (without the dependencies), we allow back-and-forth between a macro and the language front-end. A macro will be able to request ASTs of dependencies of the main AST being passed, it won’t get the whole thing in one call.</p>
<p>Macros will only have access to the definitions they’re explicitly passed (with the <code>@&lt;identifier&gt;</code> syntax) and won’t be provided anything else other than the passed definition and its dependencies, even if it so happens that at the time of expansion we type checked more. This is a part of the determinism requirements: given same inputs macros should always generate same outputs. Location of the macro call or type checker internals (or order) should not matter and should not change macro expansion.</p>
<p>Quotation in macros will be implemented using macros, e.g. when generating an expression instead of generating the ASTs manually:</p>
<pre><code>Expr.BinOp(
    left = Expr.Var(...),
    op = Binop.Add,
    right = Expr.Call(...),
)</code></pre>
<p>We implement (and distribute as a part of the language) quotation macros and instead have:</p>
<pre><code>$expr(var + f(...))</code></pre>
<p><code>$expr</code> here is macro that parses its arguments (token trees) and converts them to Fir AST expressions.</p>
<p>(This is the same idea as Rust’s <a href="https://crates.io/crates/quote"><code>quote</code></a> package.)</p>
<p>We can pass typed-checked ASTs and token trees, but not parsed ASTs (not type checked). I’m not sure how useful this would be, but if it becomes useful we can easily extend the system to allow passing parsed ASTs to macros. E.g. maybe we use <code>@@expr[...]</code> parsing an inlined expression and passing it to the macro as an expression AST.</p>
<p>In the meantime, macros can just parse the token trees as whatever they want using the language’s libraries. (instead of expecting parsed inputs)</p>
<p>Macro functions are ordinary Fir functions with a particular signature, but their signature allows passing different number of arguments with different types (token trees, type identifiers, function identifiers). The idea is that the same macro function can handle multiple call patterns, as in the <code>deriveEq</code> example above:</p>
<ul>
<li><code>$deriveEq(@Foo)</code> generates a top-level <code>Eq</code> trait implementation for the type <code>Foo</code>.</li>
<li><code>$deriveEq(@Foo, name = eqWithoutLocs, skip = [@Loc])</code> generates a function with name <code>eqWithoutLocs</code> and also comparison functions for the fields of <code>Foo</code>. The generated functions all skip the fields with type <code>Loc</code>.</li>
</ul>
<p>The function signature for this macro looks something like:</p>
<pre><code>deriveEq(inputs: Vec[TokenTree]) Ast: ...</code></pre>
<p>Where <code>TokenTree</code> is a sum type with actual token trees but also type and function identifiers, and <code>Ast</code> is a sum type that has constructors for top-level items, expressions, statements, and anything else that we allow macros to generate.</p>
<p>The reasons for this design are:</p>
<ul>
<li>Some of the macros (like <code>deriveEq</code>) will take a lot of customization parameters, and Fir doesn’t support optional arguments.</li>
<li>If we let the macro function specify the number of arguments passed, some of the macros will need bracketed arguments just to be able to pass variable number of things. E.g. if we have a regex macro that takes a number of regexes and generates a matcher function, it’d need to be called as <code>$regex([re1, re2, ...])</code> instead of <code>$regex(re1, re2, ...)</code>.</li>
</ul>
<p>By allowing arbitrary sequence of (potentially comma separated) token tree in the argument lists we allow this flexibility and let the macro do sanity checking for the arguments.</p>
<h2 id="conditional-compilation">Conditional compilation</h2>
<p>As mentioned in the intro, it’s a deliberate design goal with macros that they’re fully deterministic and they generate the same code for the same inputs regardless of the compilation settings (host or target platforms, optimization parameters, etc.).</p>
<p>Conditional compilation in Fir will be done by dedicated language features (that don’t exist today). Macros will be able to generate code that use those conditional compilation features, but they won’t be doing conditional compilation themselves.</p>
<p>For example, if we have a syntax for checking target architecture pointer size, macros won’t be able to use it but they will be able to generate code that use the syntax for checking the target architecture pointer size.</p>
<p>This doesn’t complicate the macro system implementation any more than it already is: as mentioned, we need to sandbox macros anyway (or somehow make sure in compile time that they don’t have access to certain APIs). We just prevent access to conditional compilation features in similar ways.</p>
<h2 id="hygiene">Hygiene</h2>
<p>Macro-generated code is name resolved and type checked in the using module’s environment, and so it can refer to names available at the macro call site.</p>
<p>To avoid issues when a call site imports e.g. the standard library with a prefix and the macro generates references to the standard library types, macros should generate fully qualified names. E.g. <code>Fir/Vec/Vec[U32]</code> instead of just <code>Vec[U32]</code>. However this is not enforced.</p>
<p>For the cases when a macro generates type or term ids that shouldn’t shadow definitions at the call site (either in the macro-generated code, or the code around the macro expansion), we provide a <code>gensym</code> function in the standard library. This function is only accessible by macros.</p>
<p>ASTs of types and terms passed to the macros (with the <code>@&lt;identifier&gt;</code> syntax) already have name-resolved ASTs, and using parts of those ASTs in the outputs generate qualified names that can’t be shadowed at the call sites. This is not done via a magic AST node that can only be created by the language front-end: identifiers in the ASTs can have qualifications or prefixes and that’s how macros should generate qualified names whenever possible. When we pass a <code>@MyType</code> to a macro and <code>MyType</code> uses <code>Vec</code>, the <code>Vec</code> references in the AST of <code>MyType</code> will have fully qualified path to the <code>Vec</code>. So copying that into the output also gives us a fully qualified <code>Vec</code> reference that we could also hand write.</p>
<p>Being able to write the fully qualified path <code>Foo/Bar/Baz</code> or macro-generate it doesn’t mean we can avoid importing <code>Foo/Bar</code>. It’s an explicit goal of Fir modules that the dependencies are always fully specified in the imports, and while we can access a definition in more than one way (with fully qualified paths, directly using the imported name, we can also import the same definition under different prefixes or with different names), there’s no way to access a definition without importing a module that exports it.</p>
<p>Macros don’t change this fact. They should always generate fully qualified paths to avoid shadowing and depending on modules being imported in a particular way, but the references in the generated code should still be explicitly imported by the calling module. This may mean that in some cases a call site of a macro may need to add imports that look unused, because the imported things are used in macro expansions.</p>
<p>The principle here is that macros can’t generate code that you can’t write by hand.</p>
<h2 id="final-thoughts-and-current-status">Final thoughts and current status</h2>
<p>Unlike the other blog posts about Fir, features here are not fully implemented. The parts until the deterministic execution section above are currently implemented in a prototype and working.</p>
<p>The type checker requires quite a lot of refactoring for the proper implementation, which I’m slowly working on.</p>
<p>I think this is the final feature Fir needs to be considered a proper language, ready to tackle real problems. Once done, we’ll focus on bootstrapping the language.</p>
<h2 id="updates">Updates</h2>
<ul>
<li>15/05/2026: updated with details on checking the generated code and implicit dependencies</li>
<li>12/05/2026: post published</li>
</ul>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>Fir allows recursive module imports, so it’d actually be more accurate to say “dependency DAG of SCCs of modules”. To keep things simple in this discussion we can assume modules can’t be recursive.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>There’s an edge case here that we don’t deal with and let things fail to type check: when a macro generates e.g. <code>foo</code> but there’s also an imported <code>foo</code>, definitions in the module that use <code>foo</code> can use either the imported <code>foo</code> (if they’re scheduled before the macro expansion) or the macro-generated <code>foo</code> (if they’re scheduled after the macro expansion, because local definitions shadow imported ones). This case should be extremely rare and it’s not worth complicating the design or implementation more to deal with this.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3" role="doc-endnote"><p>The library will probably provide different ASTs for parsed and type checked programs, which is easy to do in Fir thanks to <a href="https://osa1.net/posts/2026-04-15-fir-devlog.html">extensible named types</a>.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>
<entry>
    <title>Languages should have opinionated interop features</title>
    <link href="http://osa1.net/posts/2026-05-10-interop-features.html" />
    <id>http://osa1.net/posts/2026-05-10-interop-features.html</id>
    <published>2026-05-10T00:00:00Z</published>
    <updated>2026-05-10T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>A lesson we can derive from Rust’s error handling and async libraries, and Haskell’s effect system libraries, is that languages need to have opinionated (and efficient, flexible) interop features. If not and the language is flexible enough (with an expressive type system, and maybe also with metaprogramming features), users create their own solutions and the ecosystem gets fragmented.</p>
<p>Consider error reporting. Without a convenient way of error handling built into the language, a library for SQLite access, another one for HTTP requests, and another one for TCP connections will potentially use different error reporting libraries and cause friction. A similar thing commonly happens today with Rust async libraries and Haskell effect system libraries, which create a whole ecosystem of libraries that just do same things (e.g. SQLite access) but using different async, error handling, or effect system libraries.</p>
<p>The worst outcome here is entire sets of libraries that can’t be used together. The best case is you need N<sup>2</sup> adapters to convert one to the other. None of these are ideal.</p>
<p>So one of the goals with Fir is to have built-in high-level features, like <a href="https://osa1.net/posts/2025-01-18-fir-error-handling.html">checked exceptions</a> and an effect system (work in progress). Here by “high-level” I mean features that make libraries that can be composed easily and work well together. If a library returns some types of error values, and another returns another types of error values, I should be able to call them in a third library with no effort (no adapters) and without losing safety or testability.</p>
<p>Interestingly, a simple/limited solution in a simple language seems to create a better outcome here compared to a flexible language + unsatisfying solution, as it doesn’t create a fragmented ecosystem. To my knowledge, there aren’t any error handling libraries in Dart and Go that fragment the ecosystem, despite the fact that their error handling features are not perfect. On the other hand, Rust programmers can’t stop inventing async executors and Haskell programmers can’t stop inventing effect systems.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
<p>Success here looks like: a large ecosystem with composable and testable libraries that all use the built-in high-level features.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>Note that I’m not claiming that all those libraries do the same things and do them the same way. I understand that Rust async executor design space is large and there are different use cases where different designs make sense. I’m only saying that (1) these libraries create fragmented ecosystem (2) the fragmentation can be avoided by having a built-in way of doing it. Different use cases can sometimes be accommodated with different compiler backends, CLI flags, or even entire language implementations. E.g. Rust has different async executor libraries for embedded use, Go has TinyGo.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>
<entry>
    <title>A text editor I worked on in 2021-2023</title>
    <link href="http://osa1.net/posts/2026-05-07-my-text-editor.html" />
    <id>http://osa1.net/posts/2026-05-07-my-text-editor.html</id>
    <published>2026-05-07T00:00:00Z</published>
    <updated>2026-05-07T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>I was just going through old files and saw a cool video of an old project that I thought I should share.</p>
<p>In January 2021 I started working on a new text editor. Zed didn’t exist publicly at the time (it must’ve been under development) and two projects were getting a lot of hype:</p>
<ul>
<li>Alacritty because of its renderer</li>
<li>Tree-sitter because IIRC Neovim was about to ship built-in support for tree-sitter grammars for syntax highlighting</li>
</ul>
<p>Inspired by these two developments, I started implementing my own text editor. The renderer was similar to Alacritty: IIRC I even copied and modified its texture atlas generator that lazily generated glyphs. The renderer must’ve been similar too, though I’d written it myself. It only used OpenGL.</p>
<p>For language support I had different ideas. I used compiled lexers (instead of runtime-interpreted regexes like most editors and IDEs) for syntax highlighting. The editor didn’t care about the actual lexer generator used, but for Rust I used my own <a href="https://github.com/osa1/lexgen">lexgen</a>-based <a href="https://github.com/osa1/lexgen_rust/blob/ac9df1a6fbad94447beba1045ca1a7a08b74748b/crates/lexgen_rust/src/lib.rs">Rust lexer</a>.</p>
<p>I had a few use cases for these lexers, but one of the cool ones was limiting search scope to lexical elements. In the video below I search for a term, then choose whether to match uses in string literals and comments:</p>
<video controls muted playsinline preload="metadata" width="798" style="display: block; margin: 0 auto;">
<source src="/files/editor.mp4" type="video/mp4">
</video>
<p>I must’ve hated the existing regex-based language-agnostic search tools so much, a year before this I started another project for syntax-aware search: <a href="https://github.com/osa1/sg">sg</a>. <code>sg</code> uses tree-sitter grammars so adding new language support is trivial, and in principle we could even make it load tree-sitter grammars at runtime.</p>]]></summary>
</entry>
<entry>
    <title>Fir now compiles to C (+ extensible named types, associated types, modules, and more)</title>
    <link href="http://osa1.net/posts/2026-04-15-fir-devlog.html" />
    <id>http://osa1.net/posts/2026-04-15-fir-devlog.html</id>
    <published>2026-04-15T00:00:00Z</published>
    <updated>2026-04-15T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>One of my original goals with Fir was to bootstrap it as early as possible. I was so determined, I committed the first code for the self-hosted compiler in the <a href="https://github.com/fir-lang/fir/commit/a69e3cefcb42c1ad63e303e70dbd9e66d5aa512f">322nd commit</a>, on 11 April 2025, after less than a year of development in the open source<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>, when it was barely usable. To understand how early this is, we’re currently on commit 1,052, and in my opinion it only recently became somewhat usable.</p>
<p>Unsurprisingly, this turned out to be a challenge, and I had to accept the fact that a compiler for a non-trivial language is a lot of work. You really need a lot of language features + a good implementation (generating fast code) for it. It’s not that it cannot be done otherwise, but the process becomes extremely slow, tedious, and boring.</p>
<p>Something else that became evident as I worked on the self-hosted compiler was that, even if I finish it with just the features we have, I’ll have to refactor it quite significantly as we implement the planned features, to the point where it could feel more like a rewrite than a refactoring.</p>
<p>Finally, I thought (perhaps mistakenly), with some of the recent developments in programming tooling and software development methods (you know what I’m talking about), with a working reference implementation + tests, bootstrapping effort could be largely automated.</p>
<p>So I started implementing features that I consider essential for Fir 1.0, in the reference implementation. In this post we’ll look at some of these features that were recently implemented.</p>
<h2 id="fir-now-compiles-to-c">Fir now compiles to C</h2>
<p>The Fir reference implementation now compiles to C. The motivation for this development was that running the self-hosted compiler with the interpreter to compile itself was taking 8.8s, despite just parsing + name resolving. That’s already too long, and it was going to get much worse as we implement type checking, monomorphisation, and code generation.</p>
<p>I made a few attempts at optimizing the interpreter, but it became clear that with very little effort, compared to designing and implementing a bytecode interpreter, I could compile it to C<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>. Because we already had a monomorphiser, compilation to C was mostly very straightforward, and we immediately got 12x speedup: the self-hosted compiler started checking itself in 0.7s instead of 8.8s.</p>
<p>When working on the compiler, compiling the compiler to C, then compiling that C to native with clang, then running the executable on the compiler itself is currently at 1.7s.</p>
<p>This also improved the workflow in other areas: formatting the whole code base and compiling PEG files take an instant now, instead of many seconds.</p>
<p>This also allowed other things that made this even better in terms of return-on-investment: we got free garbage collection with the <a href="https://github.com/bdwgc/bdwgc">Boehm-Demers-Weiser conservative GC</a>, and value types became trivial to implement. This will also make it easier to add C FFI in the future. (more on this below)</p>
<p>The interpreter still exists, mainly to keep <a href="https://fir-lang.github.io/">the online interpreter</a> running.</p>
<h2 id="value-types">Value types</h2>
<p>Fir literally started as a “high-level language with value types”, but it wasn’t entirely trivial to implement them until we had the C backend.</p>
<p>With the C backend, it became a matter of making it generate typed code (instead of treating all values as <code>uint64_t*</code> or similar), and then not <code>malloc</code>ing value types.</p>
<p>Here’s an example value type, from the standard library:</p>
<pre><code># Immutable, UTF-8 encoded strings.
value type Str(
    # UTF-8 encoding of the string.
    _bytes: Array[U8],
)</code></pre>
<p>Relevant struct definitions in generated C:</p>
<pre><code>typedef struct Array_U8 {
    U8* data_ptr;
    uint64_t len;
} Array_U8;

typedef struct Str {
    Array_U8 _0;
} Str;</code></pre>
<p>This type is then used directly (instead of as a pointer). Here’s a forward-declaration of a function from the self-hosted compiler:</p>
<pre><code>// Compiler/ParseUtils.fir:32:1 parseCharLit[U32]
static Char _fun_8(Str _p0);</code></pre>
<p>(The comment line here is generated by the compiler to make it easier to read the generated code.)</p>
<h2 id="associated-types">Associated types</h2>
<p>This was a feature that I delayed implementing for way too longer than I should’ve, mostly because I didn’t know how to implement them and it took a while to figure it out.</p>
<p>Associated types in Fir are the same feature as associated types in Rust. The most common use case for them is the <code>Iterator</code> trait. Before associated types, <code>Iterator</code> in Fir looked like this: (omitting extra methods with default implementations)</p>
<pre><code>trait Iterator[iter, item, exn]:
    next(self: iter) Option[item] / exn</code></pre>
<p>Here’s how the <code>CharIter</code>’s (iterates characters of a string) <code>Iterator</code> implementation looked like:</p>
<pre><code>impl Iterator[CharIter, Char, exn]:
    next(self: CharIter) Option[Char] / exn:</code></pre>
<p>This trait definition has a problem. The type of <code>Iterator.next</code> is this:</p>
<pre><code>[Iterator[iter, item, exn]] Fn(self: iter) Option[item] / exn</code></pre>
<p>Based on this type, in a call site like <code>charIter.next()</code> (where <code>charIter : CharIter</code>), we generate the predicate <code>Iterator[CharIter, item, exn]</code> and the type of the call expression becomes <code>Option[item]</code>. (where <code>item</code> and <code>exn</code> are fresh unification variables)</p>
<p>If the expected type of the call expressions is not precise enough to unify that <code>item</code> type with a concrete type, the predicate never becomes <code>Iterator[CharIter, Char, exn]</code>, and we can’t solve it, because there isn’t an <code>impl</code> for <code>Iterator[CharIter, item, exn]</code> (note: with generic <code>item</code>). We only have <code>Iterator[CharIter, Char, exn]</code>.</p>
<p>This resulted in lots of type annotations in the code that uses the <code>Iterator</code> trait. Most importantly, it required type annotations in <code>for</code> loops as <code>for</code> loops used <code>Iterator</code> under the hood. For example:</p>
<pre><code>for char: Char in charIter:
    print(char)</code></pre>
<p>Here <code>print</code> is a generic function that works on any <code>ToStr</code> type, so without the type annotation the predicate became too generic and couldn’t be solved.</p>
<p>With associated types, the trait now looks like this:</p>
<pre><code>trait Iterator[iter, exn]:
    type Item
    next(self: iter) Option[Item] / exn

impl Iterator[CharIter, exn]:
    type Item = Char
    next(self: CharIter) Option[Char] / exn:</code></pre>
<p>With this definition, the predicate for the same call becomes <code>Iterator[CharIter, exn]</code> (where <code>exn</code> is a fresh unification variable), and that’s immediately resolved using this <code>impl</code>. The <code>for</code> loop example above now works without a type annotation.</p>
<p>Associated types also allowed the next feature.</p>
<h2 id="its-now-possible-to-implement-traits-for-record-types">It’s now possible to implement traits for record types</h2>
<p>This was a small development in terms of code, but an important one for the language. Until this feature, we could pass records around and access fields in polymorphic contexts, but if we want to take a polymorphic record (with a row extension) and e.g. print it, there was no way.</p>
<p>This wasn’t too important until recently, as the main use case for records was returning multiple values. You’d then destruct/pattern match on the return values directly and use them individually. For example:</p>
<pre><code>divRem(x: U32, y: U32) (div: U32, rem: U32): ...

# Users just match on the fields instead of passing the return value around
# as a record.
let (div, rem) = divRem(a, b)</code></pre>
<p>However with the other developments listed below, records became much more useful, and not being able to implement traits on them became a problem.</p>
<p>The solution was porting PureScript’s <a href="https://pursuit.purescript.org/builtins/docs/Prim.RowList"><code>RowToList</code></a> typeclass to Fir. The idea is that we define a “magic” trait that converts record rows into heterogeneous lists:</p>
<pre><code>trait RecRowToList[recRow]:
    type List
    rowToList(rec: (..recRow)) Option[List]</code></pre>
<p>Here <code>recRow</code> is a record-row-kinded type parameter. This trait is resolved by the compiler for any valid (with right kind) type argument, and depending on the type argument the <code>List</code> type is also generated as an heterogeneous list. The heterogeneous list type is defined as this, in the standard library:</p>
<pre><code>value type List[head, tail](
    head: head,
    tail: Option[tail],
)</code></pre>
<p>In the generated <code>List</code> types for record rows, the <code>head</code> type is always a <code>RecordField</code>:</p>
<pre><code>value type RecordField[t](
    label: Str,
    value_: t,
)</code></pre>
<p>So for example, <code>RecRowToList[row(x: U32, msg: Str)]</code> is resolved by the type checker, and the <code>List</code> type is also resolved as <code>List[RecordField[Str], List[RecordField[U32], []]]</code><a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> <a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a>.</p>
<p>Here’s how to implement <code>ToStr</code> on records using this machinery:</p>
<pre><code>impl[ToStr[RecRowToList[r].List]] ToStr[(..r)]:
    toStr(self: (..r)) Str:
        match RecRowToList[r].rowToList(self):
            Option.None: &quot;()&quot;
            Option.Some(list): &quot;(`list`)&quot;


impl[ToStr[t]] ToStr[RecordField[t]]:
    toStr(self: RecordField[t]) Str:
        &quot;`self.label` = `self.value_`&quot;


impl[ToStr[head], ToStr[tail]] ToStr[List[head, tail]]:
    toStr(self: List[head, tail]) Str:
        match self.tail:
            Option.None: &quot;`self.head`&quot;
            Option.Some(t): &quot;`self.head`, `t`&quot;

impl ToStr[[]]:
    toStr(self: []) Str:
        panic(&quot;unreachable&quot;)</code></pre>
<p>Note that the <code>List</code> and <code>RecordField</code> types are value types, so <code>rowToList</code> does not allocate. It just generates a different representation of the record on stack that we can recurse on.</p>
<h2 id="matching-a-bunch-of-fields-at-once-as-a-record">Matching a bunch of fields at once, as a record</h2>
<p>This was one of the very simple features that made records so much more useful.</p>
<p>When pattern matching fields, we can now use <code>..var</code> syntax to assign unmatched fields to a variable, as a record. Here’s a simple example:</p>
<pre><code>type Test(
    x: U32,
    y: U32,
    z: U32,
    msg: Str
)


main():
    let x = Test(x = 1, y = 2, z = 3, msg = &quot;hi&quot;)
    let Test(y, ..rest) = x
    print(rest)</code></pre>
<p>In the pattern, <code>y</code> matches the field <code>y</code>, <code>rest</code> matches the rest of the fields, as <code>(x: U32, z: U32, msg: Str)</code>. Then, using the <code>ToStr</code> implementation of records as shows above, this prints <code>(msg = "hi", x = 1, z = 3)</code>.</p>
<p>This is not the main use case for this feature, but just as a note, when combined with traits on records, this allows easily implementing traits by reusing records’ implementations of the traits. For example, <code>ToStr</code> for <code>Test</code> here can be implemented as:</p>
<pre><code>impl ToStr[Test]:
    toStr(self: Test) Str:
        let Test(..fields) = self
        &quot;Test`fields`&quot;</code></pre>
<p>With this implementation, the value <code>x</code> above now prints as <code>Test(msg = "hi", x = 1, y = 2, z = 3)</code>. This is the same output as the derived <code>ToStr</code> for this type, just with the different field order. (derived <code>impl</code> would print fields in the source code order, so: <code>x</code>, <code>y</code>, <code>z</code>, <code>msg</code>)</p>
<h2 id="splicing-records-and-named-arguments">Splicing records and named arguments</h2>
<p>We can now pass records as named arguments. The feature above copies field values to records, this one copies records to named arguments for fields.</p>
<p>This is also straightforward and I think a simple example should suffice, using the same types as above:</p>
<pre><code>main():
    let x = Test(x = 1, y = 2, z = 3, msg = &quot;hi&quot;)
    print(x)

    let Test(y, ..rest) = x     # rest: (x: U32, z: U32, msg: Str)
    let y = Test(y = 0, ..rest)
    print(y)

# output:
# Test(msg = hi, x = 1, y = 2, z = 3)
# Test(msg = hi, x = 1, y = 0, z = 3)</code></pre>
<p>Reminder: records (and variants) are value types. They’re not heap allocated. So the code above does not allocate for the <code>rest</code> record.</p>
<p>We can also make larger records from smaller ones with this feature:</p>
<pre><code>main():
    let x = (x = u32(123), y = u32(456))
    let y = (msg = &quot;hi&quot;, ..x)
    print(y)

# output: (msg = &quot;hi&quot;, x = 123, y = 456)</code></pre>
<p>Splicing two records together is currently not possible: there can be at most one <code>..expr</code> in a record expression.</p>
<h2 id="extensible-named-types">Extensible named types</h2>
<p>This is a big one that I talked about <a href="https://osa1.net/posts/2026-03-07-extensible-named-types-fir.html">in a previous post</a>. It only became usable after the record features above, associated types, and type synonyms.</p>
<p>For a running example, I added <a href="https://fir-lang.github.io/?file=NamedTypeExtensions.fir">a full program</a> to the online interpreter, showing a solution to the extensible AST types problem described in the blog post. It’s extensively documented, explaining all the interesting bits, so I recommend just checking it out.</p>
<p>In short, we allow extending named types using record rows. Pattern matching, allocation, and everything else works the same way as records. Here’s an example:</p>
<pre><code>type Foo[r](
    x: U32,
    y: U32,
    ..r
)


impl[r: Row[Rec], ToStr[RecRowToList[r].List]] ToStr[Foo[r]]:
    toStr(self: Foo[r]) Str:
        let Foo(..fields) = self
        &quot;Foo`fields`&quot;


main():
    let x = Foo(x = 1, y = 2, msg = &quot;hi&quot;)
    let y = Foo(b = Bool.True, y = 10, blah = Option.Some(u32(0)), x = 11)
    print(x)
    print(y)


# output:
# Foo(msg = hi, x = 1, y = 2)
# Foo(b = Bool.True, blah = Option.Some(0), x = 11, y = 10)</code></pre>
<p><code>Foo</code> here is an extensible type. In the allocation sites, we allocate it with different extra fields. The inferred types here are:</p>
<ul>
<li><code>x : Foo[row(msg: Str)]</code></li>
<li><code>y : Foo[row(b: Bool, blah: Option[U32])]</code></li>
</ul>
<p><code>ToStr</code> implementation is implemented using the record field matching features explained above, but it can also be derived.</p>
<p>This feature is used in the self-hosted compiler and the tools. The code is a bit long, but we basically use the same idea demonstrated in the online demo linked above, to add different fields to the AST nodes used by different tools. For example, here’s the AST node type for variant expressions, when compiled to C, as a part of the self-hosted compiler:</p>
<pre><code>typedef struct VariantExpr_CompilerAstExts {
    Expr_CompilerAstExts* _0;
    Option_Ty _1;
} VariantExpr_CompilerAstExts;</code></pre>
<p>And here’s the exact same type, but in the formatter’s compiled C code:</p>
<pre><code>typedef struct VariantExpr_DefaultAstExts {
    Expr_DefaultAstExts* _0;
} VariantExpr_DefaultAstExts;</code></pre>
<p>This is smaller because the formatter doesn’t have the extra field the compiler adds to the type.</p>
<p>Both are generated from this Fir type:</p>
<pre><code>type VariantExpr[exts](
    expr: Expr[exts],
    ..AstExts[exts].InferredTyExts
)</code></pre>
<p>You can see the full generic AST definitions used by the compiler and other tools <a href="https://github.com/fir-lang/fir/blob/96429adb83b2242ff806fe624dfe65be45e42b82/Compiler/Ast.fir">here</a>.</p>
<p>Because we can implement traits on records and record rows now, deriving traits also work on extensible types. In the example above, I can just add <code>#[derive(ToDoc)]</code> to <code>Foo</code> and then print it like this:</p>
<pre><code>#[derive(ToDoc)]
type Foo[r](
    x: U32,
    y: U32,
    ..r
)


main():
    let x = Foo(x = 1, y = 2, msg = &quot;hi&quot;)
    let y = Foo(b = Bool.True, y = 10, blah = Option.Some(u32(0)), x = 11)
    print(x.toDoc().render(80))
    print(y.toDoc().render(80))


# output:
# Foo(x = 1, y = 2, (msg = &quot;hi&quot;))
# Foo(x = 11, y = 10, (b = Bool.True, blah = Option.Some(0)))</code></pre>
<p>The AST types in the compiler all derive traits this way.</p>
<h2 id="modules">Modules</h2>
<p>Until recently, importing a module in Fir just parsed the module and copied the parsed code to the current module.</p>
<p>In other words, there was just one module. There were no name spaces, private definitions, selective imports, or importing with renaming.</p>
<p>It took quite a while to design and implement a proper module system and I actually found it quite difficult to design this, even though in the end the design was quite simple. There were two problems that made this difficult for me:</p>
<p>First, I wasn’t sure whether we want just namespacing (plus the usual features for selective imports, renaming, etc.) or something fancier, like first-class modules.</p>
<p>To figure this out I <a href="https://github.com/osa1/a-modular-module-system">studied OCaml’s module system</a> (and also <a href="https://osa1.net/posts/2026-03-10-containing-contagious-types.html">blogged about it</a>) and <a href="https://github.com/osa1/oneml">1ML</a> in a bit more detail, and decided that I want the modules to be type checking units (to be checked in parallel) and namespaces, instead of first-class values.</p>
<p>This significantly simplified the design, but the design space was still huge and there were just two constraints:</p>
<ul>
<li>They shouldn’t require separate files for interfaces and implementations.</li>
<li>Recursive imports should be allowed.</li>
</ul>
<p>So the second problem was that these requirements did not constrain the design space enough to give me a small number of options, with obvious and significant tradeoffs between them. I could probably come up with a dozen designs that would all be good enough.</p>
<p>In the end I had to make somewhat arbitrary decisions, based on what I needed in the past, from the other module systems that I used, and what I didn’t, and preference and taste. I updated one thing as I implemented it, and settled on this:</p>
<ul>
<li><p>Recursive imports are allowed, and there are no interface files. Each module is implemented as one file.</p></li>
<li><p>Module paths follow directory structure on the file system. E.g. an import to <code>Foo/Bar/Baz</code> requires the module to be in <code>Foo/Bar/Baz.fir</code> in the package root.</p></li>
<li><p>A module exports every non-underscored symbol that it has direct access to. This includes names that it imports. There’s no explicit exporting.</p></li>
<li><p>Underscored symbols are only accessible with explicit module paths. There’s nothing that’s truly private. If you really want you can access all private names. This keeps the design simple by avoiding fine-grained access control with things like <code>pub(crate)</code> or <code>pub(foo::bar::baz)</code> in Rust, and conditional compilation for exposing things for testing.</p></li>
<li><p>The usual renaming features are possible: modules can be imported with different names, individual definitions can be imported with different names.</p></li>
<li><p>Module path syntax is different than associated member access syntax: module paths use <code>/</code> as separator, associated members use <code>.</code>. For example:</p>
<ul>
<li><code>A/B</code> in expression context means “constructor B in module A”</li>
<li><code>B.C</code> in expression context means “constructor C of type B”</li>
<li><code>A/B.C</code> in expression context means “constructor C in type B in module A”</li>
</ul></li>
<li><p>This is currently not implemented: when a module exports something (type with constructors, function, …), everything referenced by the signature of the exported thing should also be exported.</p>
<p>This is to avoid the common issues in some languages where you export a function, but not the types that it uses, and the user either has to get it from another package or can’t use your function. Or even if the function is usable (for example, the private type is in the return type position and you just call the function but don’t use the return value), users can’t add type annotations to your function.</p>
<p>The principle here is that I should be able to take any expression in the program and give it a type annotation in a <code>let</code> statement. For trait methods, I should be able to explicitly call the methods with the type arguments. E.g. instead of <code>foo.toStr()</code> I should be able to do <code>ToStr.toStr[&lt;type of foo&gt;](foo)</code> so that means the trait type and all of the type arguments of the trait should be in scope and accessible.</p></li>
</ul>
<p>Here’s how relevant syntax looks currently:</p>
<pre><code># Each module can have at most one `import`. Documentation comments added to
# `import` lines become documentation comment of the module. When a module
# doesn&#39;t import anything an empty `import []` can be added to document the
# module.

## This is the module documentation.

import [
    # Import everything from `Fir/Prelude`, to use directly (without module
    # prefix).
    # This is implicitly added to every module already, so not needed. On here
    # for demonstration purposes.
    Fir/Prelude,

    # This allows using symbols imported from the module with the given prefix.
    # E.g. instead of `Option.Some(123)` we do `P/Option.Some(123)`.
    Fir/Prelude as P,

    # Only imports listed things.
    Fir/Prelude/[Option, Result, min, max],

    # Only imports listed things, but with renaming.
    Fir/Prelude/[Option, Result, min as _min, max as _max],
]


main():
    # Some random combination of imported things, used in different ways.
    print(Option.Some(_min(P/max(P/u32(0), u32(1)), u32(2))))


# output: Option.Some(1)</code></pre>
<p>Some other notes and clarifications on this design:</p>
<ul>
<li><p>Re-exporting imported things can be avoided by adding underscore to the imported names. E.g. in the code examples above, <code>_min</code> and <code>_max</code> are not exported from this module, but other non-underscored imports are.</p>
<p>This is not a special case for imports: underscored things are never exported. If you import something with an underscored name, it’s also not exported just like defined things.</p></li>
<li><p>Modules are only imported explicitly. There’s no re-exporting a module. So if the module above is <code>Foo/Bar</code>, you don’t get <code>Foo/Bar/P</code> when you import it.</p></li>
</ul>
<p>So far I’m happy with how it looks (syntax) and how it works, but as with most things in this language, it’s open to improvements, refinements, and even backwards incompatible changes.</p>
<h2 id="smaller-features-kind-annotations-and-type-synonyms">Smaller features: kind annotations and type synonyms</h2>
<p>These don’t need much introduction, but I want to document why they were needed and implemented.</p>
<p>Type synonyms came in handy in two places:</p>
<ul>
<li><p>With associated types, we want to refer to the associated types directly in the <code>trait</code> and <code>impl</code> bodies. For example, in the <code>Iterator</code> trait:</p>
<pre><code>trait Iterator[iter, exn]:
    type Item
    next(self: iter) Option[Item] / exn</code></pre>
<p>Normally the way you refer to <code>Item</code> here is with <code>Iterator[iter, exn].Item</code>. But within the <code>trait</code> body (and also in <code>impl</code>s), we want to refer to them as <code>Item</code> directly.</p></li>
<li><p>With extensible named types, we want to be able to define generic (extensible) types in a shared library, and the in the using libraries we want to override them (shadow the original definitions) with instantiated types. For example, the AST library defines <code>type VarExpr[exts](...)</code>. The formatter overrides it with the extension type it needs: <code>type VarExpr =   Ast/VarExpr[FormatterExts]</code>.</p></li>
</ul>
<p>The second one is obviously a type synonym, but the first one also uses the same underlying code. We just make type synonyms scoped, and create new synonyms in <code>trait</code> and <code>impl</code> bodies, for the associated types.</p>
<p>Kind annotations became necessary as we started using row-kinded type parameters more, for the extensible named types. Currently kind inference is very simple, it only looks at the current definition. If a type parameter is used in a row extension position (i.e. <code>..var</code>), its kind is inferred as <code>Row[Rec]</code> or <code>Row[Var]</code> depending on whether the extension is in a record (or fields) or variant (or constructors).</p>
<p>That means that in the extensible named type example above:</p>
<pre><code>type Foo[r](
    x: U32,
    y: U32,
    ..r
)</code></pre>
<p>Here <code>r</code>’s kind is inferred as <code>Row[Rec]</code>. But if we had another type that passed a generic <code>r</code> to it:</p>
<pre><code>type Bar[r](foo: Foo[r])</code></pre>
<p>This <code>r</code>’s kind was inferred as <code>*</code>, which is incorrect.</p>
<p>I don’t want to introduce module-level kind inference for various reasons, so I had to add kind annotations here. The correct definition with kind annotations is:</p>
<pre><code>type Bar[r: Row[Rec]](foo: Foo[r])</code></pre>
<p>Kinds follow the same syntax as types. <code>*</code>-kinded type parameters are just listed, without any annotations. This is useful to avoid reordering type parameters just to specify kinds of some of the types. E.g. if I have</p>
<pre><code>foo(x: t, y: Bar[r]): ...</code></pre>
<p>Here the inferred type parameters are <code>[t: *, r: *]</code>, generated from the signature by left-to-right scan. When calling we can explicitly pass them as <code>foo[type1, type2](...)</code>.</p>
<p>This passes wrong kinded type to <code>Bar</code>. To fix, we have to specify the kind of <code>r</code>:</p>
<pre><code>foo[r: Row[Rec]](x: t, y: Bar[r]): ...</code></pre>
<p>But this also reorders type parameters as <code>[r: Row[Rec], t: *]</code> now, as the type parameter lists are generated by a left-to-right scan of the signature.</p>
<p>To fix, we have to also list the type parameter <code>t</code> explicitly, just without a kind:</p>
<pre><code>foo[t, r: Row[Rec]](x: t, y: Bar[r]): ...</code></pre>
<p>This gives us the original order of the type parameters, but with the correct kinds: <code>[t: *, r: Row[Rec]]</code>.</p>
<h2 id="next-up-c-header-imports-c-ffi">Next up: C header imports (C FFI)</h2>
<p>This post is already too long so I want to keep this part short for now. With the (1) resources that I have (2) things I want to do with this language (3) what we have currently (current implementation), the shortest path to success (some kind of adoption) that I can see is by making C interop absolutely effortless.</p>
<p>By “effortless” I really mean it: I should be able to import a C header file in directly in Fir and just use the definitions and link the generated C with object files implementing the prototypes, and provide implementations for symbols used by other compiled C code.</p>
<p>Similar to the module system, this is an area I don’t have a lot of experience about. Depending on things that are our out of my control (i.e. life, responsibilities), and whether I’ll encounter fundamental issues, I suspect this will take 6-12 months to fully implement. Once done, Fir will be useful for many use cases.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>I started working on it earlier in 2024. Open sourced in June 2024.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>This was partly thanks to the GCC extension <a href="https://gcc.gnu.org/onlinedocs/gcc/Statement-Exprs.html">statement expressions</a>, which allowed me to compile nested expressions directly to C without having to flatten them in an A-normal form IR or similar. The extension is also supported by clang so it didn’t make the generated C less portable.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3" role="doc-endnote"><p><code>[]</code> is the empty variant type, which doesn’t have any values.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4" role="doc-endnote"><p>The generated list fields are sorted on field names, so <code>msg</code> comes before <code>x</code> here.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>
<entry>
    <title>Exceptions as shared secrets, demonstrated</title>
    <link href="http://osa1.net/posts/2026-03-13-exceptions-as-shared-secrets.html" />
    <id>http://osa1.net/posts/2026-03-13-exceptions-as-shared-secrets.html</id>
    <published>2026-03-13T00:00:00Z</published>
    <updated>2026-03-13T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Robert Harper’s <a href="https://existentialtype.wordpress.com/2012/12/03/exceptions-are-shared-secrets/">“Exceptions Are Shared Secrets”</a> is an intriguing blog post, but it may come as a bit abstract unless you’re already familiar with the idea of accidental exception (or more generally, effect) handling, as the post has no code.</p>
<p>In this post I want to give an example of the problems mentioned in the original post, and say a few words on how we might go about working around or fixing these issues.</p>
<p>The original post makes three assumptions about what an exception is and how it should be used:</p>
<ol type="1">
<li>An exception is just a way of passing a value from a “raiser” to a “handler”.</li>
<li>The raiser wants to limit who can intercept and handle the value (also called a “message”) being passed.</li>
<li>Who can intercept and handle an exception/message needs to be agreed upon via “dynamic classification”.</li>
</ol>
<p>My understanding of “dynamic classification” is that the cooperation between a raiser and handler doesn’t happen via static types (or any other static mechanism), but by agreeing upon some dynamic features of the values being passed, in runtime (e.g. identity of the object being raised).</p>
<p>I found it to be very difficult to come up with a real-world example of accidental exception handling causing a real bug, and I’m not interested in hypothetical issues that much. So for a long time I thought the issue is not that “real”. It was only by coincidence that I came across an example in a discussion on <a href="https://github.com/WebAssembly/stack-switching/discussions/27">stack switching</a> in WebAssembly. Here’s my Python rewrite of the original example demonstrating the issue: (full code in a few languages at the end of the post)</p>
<p>We’re implementing sequences that call a callback with the elements in the sequence:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="co">## The base class for sequences.</span></span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true"></a><span class="kw">class</span> Sequence:</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true"></a>        <span class="cf">raise</span> <span class="pp">NotImplementedError</span></span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true"></a></span>
<span id="cb1-6"><a href="#cb1-6" aria-hidden="true"></a></span>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true"></a><span class="co">## Counts from a given integer up. Does not stop.</span></span>
<span id="cb1-8"><a href="#cb1-8" aria-hidden="true"></a><span class="kw">class</span> CountFrom(Sequence):</span>
<span id="cb1-9"><a href="#cb1-9" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, start: <span class="bu">int</span>):</span>
<span id="cb1-10"><a href="#cb1-10" aria-hidden="true"></a>        <span class="va">self</span>.start <span class="op">=</span> start</span>
<span id="cb1-11"><a href="#cb1-11" aria-hidden="true"></a></span>
<span id="cb1-12"><a href="#cb1-12" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb1-13"><a href="#cb1-13" aria-hidden="true"></a>        i <span class="op">=</span> <span class="va">self</span>.start</span>
<span id="cb1-14"><a href="#cb1-14" aria-hidden="true"></a>        <span class="cf">while</span> <span class="va">True</span>:</span>
<span id="cb1-15"><a href="#cb1-15" aria-hidden="true"></a>            consumer(i)</span>
<span id="cb1-16"><a href="#cb1-16" aria-hidden="true"></a>            i <span class="op">+=</span> <span class="dv">1</span></span>
<span id="cb1-17"><a href="#cb1-17" aria-hidden="true"></a></span>
<span id="cb1-18"><a href="#cb1-18" aria-hidden="true"></a></span>
<span id="cb1-19"><a href="#cb1-19" aria-hidden="true"></a><span class="co">## An empty sequence: does not call the callback.</span></span>
<span id="cb1-20"><a href="#cb1-20" aria-hidden="true"></a><span class="kw">class</span> Empty(Sequence):</span>
<span id="cb1-21"><a href="#cb1-21" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb1-22"><a href="#cb1-22" aria-hidden="true"></a>        <span class="cf">pass</span></span></code></pre></div>
<p>We want to implement a sequence that takes two sequences and an amount as arguments. It runs the first sequence the given number of times, and then runs the second sequence in full.</p>
<p>A problem here is that sequences don’t support stopping after a while, they always run until completion (or forever, as in <code>CountFrom</code>). So how do we stop the first sequence after the given number of times?</p>
<p>We throw an exception in the first sequence’s callback and catch it in the call site that runs the first sequence. Here’s the full <code>AppendAfter</code> that implements this idea:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a><span class="co">## The exception used to signal that the first sequence should be stopped, in</span></span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true"></a><span class="co">## `AppendAfter`.</span></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true"></a><span class="kw">class</span> AppendAfterException(<span class="pp">Exception</span>):</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true"></a>    <span class="cf">pass</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true"></a></span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true"></a></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true"></a><span class="co">## Runs the first sequence `amount` times, then runs the second sequence.</span></span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true"></a><span class="kw">class</span> AppendAfter(Sequence):</span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, first: Sequence, amount: <span class="bu">int</span>, second: Sequence):</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true"></a>        <span class="va">self</span>.first <span class="op">=</span> first</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true"></a>        <span class="va">self</span>.amount <span class="op">=</span> amount</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true"></a>        <span class="va">self</span>.second <span class="op">=</span> second</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true"></a></span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true"></a>        count <span class="op">=</span> <span class="va">self</span>.amount</span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true"></a></span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true"></a>        <span class="co"># The callback for the first sequence. Throws an exception after being</span></span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true"></a>        <span class="co"># called `amount` times to stop iterating the first sequence.</span></span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true"></a>        <span class="kw">def</span> limited_consumer(element):</span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true"></a>            <span class="kw">nonlocal</span> count</span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true"></a></span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true"></a>            <span class="co"># Note: weird `count` update below is intentional.</span></span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true"></a>            current <span class="op">=</span> count</span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true"></a>            count <span class="op">-=</span> <span class="dv">1</span></span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true"></a>            <span class="cf">if</span> current <span class="op">==</span> <span class="dv">0</span>:</span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true"></a>                <span class="cf">raise</span> AppendAfterException()</span>
<span id="cb2-27"><a href="#cb2-27" aria-hidden="true"></a>            consumer(element)</span>
<span id="cb2-28"><a href="#cb2-28" aria-hidden="true"></a></span>
<span id="cb2-29"><a href="#cb2-29" aria-hidden="true"></a>        <span class="co"># Run the first sequence until the callback throws, signalling to stop</span></span>
<span id="cb2-30"><a href="#cb2-30" aria-hidden="true"></a>        <span class="co"># the first sequence.</span></span>
<span id="cb2-31"><a href="#cb2-31" aria-hidden="true"></a>        <span class="cf">try</span>:</span>
<span id="cb2-32"><a href="#cb2-32" aria-hidden="true"></a>            <span class="va">self</span>.first.for_each(limited_consumer)</span>
<span id="cb2-33"><a href="#cb2-33" aria-hidden="true"></a>        <span class="cf">except</span> AppendAfterException:</span>
<span id="cb2-34"><a href="#cb2-34" aria-hidden="true"></a>            <span class="cf">pass</span></span>
<span id="cb2-35"><a href="#cb2-35" aria-hidden="true"></a></span>
<span id="cb2-36"><a href="#cb2-36" aria-hidden="true"></a>        <span class="va">self</span>.second.for_each(consumer)</span></code></pre></div>
<p>Here’s an example of how this works:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a>AppendAfter(CountFrom(<span class="dv">0</span>), <span class="dv">5</span>, Empty()).for_each(<span class="bu">print</span>)</span></code></pre></div>
<p>This prints: 0, 1, 2, 3, 4. (each on a new line)</p>
<p>But the code also has a bug. Here’s another use of it that doesn’t work as expected:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a>AppendAfter(</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a>    AppendAfter(CountFrom(<span class="dv">0</span>), <span class="dv">10</span>, CountFrom(<span class="dv">20</span>)),</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true"></a>    <span class="dv">5</span>,</span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true"></a>    Empty()</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true"></a>).for_each(<span class="bu">print</span>)</span></code></pre></div>
<p>This counts to 4, then jumps to 20, and then loops infinitely.</p>
<p>Here’s the problem: the outer <code>AppendAfter</code> counts to 5 in the callback it passes to the inner <code>AppendAfter</code> and then throws an exception to stop iteration. The inner <code>AppendAfter</code> passes the same callback to its first sequence, while also counting. When the outer <code>AppendAfter</code>’s callback throws after 5 iterations, the exception is handled by the inner <code>AppendAfter</code>’s exception handler. So the outer <code>AppendAfter</code> never sees this exception, and it keeps running its first sequence.</p>
<p>The outer sequence never throws an exception again, because of the way we update the <code>count</code> local: we update it first and then check for its previous value. This looks strange in Python, but in a language with pre/post increments/decrements it looks more plausible:</p>
<pre><code>if (count-- == 0) {
  throw AppendAfterException();
}</code></pre>
<p>Once this exception is caught by a wrong handler, <code>count</code> never becomes 0 again, so the iteration never stops.</p>
<p>According to the original post, an exception should be a “shared secret” between a raiser and a handler, meaning no other handler (other than the intended one) should be able to intercept and decipher it.</p>
<p>I’m not aware of any language that allows this kind of exceptions<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a>. To fix this in a way that somewhat resembles the exceptions explained in the original post, we need something unique shared between a raiser and a handler, so that the handler only catches the right exceptions and propagates the rest. In our demo, this is just a matter of creating the exception value ahead of time, in a scope shared between the raiser and handler, and then handling based on object identity. Here’s the fixed <code>AppendAfter</code>:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a><span class="kw">class</span> AppendAfter(Sequence):</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true"></a>    ...</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true"></a></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true"></a>        count <span class="op">=</span> <span class="va">self</span>.amount</span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true"></a></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true"></a>        <span class="co"># We create the exception value ahead of time. Both the raiser and</span></span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true"></a>        <span class="co"># handler have access to it.</span></span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true"></a>        sentinel <span class="op">=</span> AppendAfterException()</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true"></a></span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true"></a>        <span class="kw">def</span> limited_consumer(element):</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true"></a>            <span class="kw">nonlocal</span> count</span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true"></a>            current <span class="op">=</span> count</span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true"></a>            count <span class="op">-=</span> <span class="dv">1</span></span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true"></a>            <span class="cf">if</span> current <span class="op">==</span> <span class="dv">0</span>:</span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true"></a>                <span class="cf">raise</span> sentinel</span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true"></a>            consumer(element)</span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true"></a></span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true"></a>        <span class="cf">try</span>:</span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true"></a>            <span class="va">self</span>.first.for_each(limited_consumer)</span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true"></a>        <span class="cf">except</span> AppendAfterException <span class="im">as</span> e:</span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true"></a>            <span class="cf">if</span> e <span class="kw">is</span> <span class="kw">not</span> sentinel:</span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true"></a>                <span class="cf">raise</span></span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true"></a></span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true"></a>        <span class="va">self</span>.second.for_each(consumer)</span></code></pre></div>
<p>Full code:</p>
<details>
<p><summary>Python implementation with the bug</summary></p>
<div class="sourceCode" id="cb7"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a><span class="im">from</span> collections.abc <span class="im">import</span> Callable</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true"></a></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true"></a></span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true"></a><span class="kw">class</span> AppendAfterException(<span class="pp">Exception</span>):</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true"></a>    <span class="cf">pass</span></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true"></a></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true"></a></span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true"></a><span class="kw">class</span> Sequence:</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb7-10"><a href="#cb7-10" aria-hidden="true"></a>        <span class="cf">raise</span> <span class="pp">NotImplementedError</span></span>
<span id="cb7-11"><a href="#cb7-11" aria-hidden="true"></a></span>
<span id="cb7-12"><a href="#cb7-12" aria-hidden="true"></a></span>
<span id="cb7-13"><a href="#cb7-13" aria-hidden="true"></a><span class="kw">class</span> CountFrom(Sequence):</span>
<span id="cb7-14"><a href="#cb7-14" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, start: <span class="bu">int</span>):</span>
<span id="cb7-15"><a href="#cb7-15" aria-hidden="true"></a>        <span class="va">self</span>.start <span class="op">=</span> start</span>
<span id="cb7-16"><a href="#cb7-16" aria-hidden="true"></a></span>
<span id="cb7-17"><a href="#cb7-17" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb7-18"><a href="#cb7-18" aria-hidden="true"></a>        i <span class="op">=</span> <span class="va">self</span>.start</span>
<span id="cb7-19"><a href="#cb7-19" aria-hidden="true"></a>        <span class="cf">while</span> <span class="va">True</span>:</span>
<span id="cb7-20"><a href="#cb7-20" aria-hidden="true"></a>            consumer(i)</span>
<span id="cb7-21"><a href="#cb7-21" aria-hidden="true"></a>            i <span class="op">+=</span> <span class="dv">1</span></span>
<span id="cb7-22"><a href="#cb7-22" aria-hidden="true"></a></span>
<span id="cb7-23"><a href="#cb7-23" aria-hidden="true"></a></span>
<span id="cb7-24"><a href="#cb7-24" aria-hidden="true"></a><span class="kw">class</span> Empty(Sequence):</span>
<span id="cb7-25"><a href="#cb7-25" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb7-26"><a href="#cb7-26" aria-hidden="true"></a>        <span class="cf">pass</span></span>
<span id="cb7-27"><a href="#cb7-27" aria-hidden="true"></a></span>
<span id="cb7-28"><a href="#cb7-28" aria-hidden="true"></a></span>
<span id="cb7-29"><a href="#cb7-29" aria-hidden="true"></a><span class="kw">class</span> AppendAfter(Sequence):</span>
<span id="cb7-30"><a href="#cb7-30" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, first: Sequence, amount: <span class="bu">int</span>, second: Sequence):</span>
<span id="cb7-31"><a href="#cb7-31" aria-hidden="true"></a>        <span class="va">self</span>.first <span class="op">=</span> first</span>
<span id="cb7-32"><a href="#cb7-32" aria-hidden="true"></a>        <span class="va">self</span>.amount <span class="op">=</span> amount</span>
<span id="cb7-33"><a href="#cb7-33" aria-hidden="true"></a>        <span class="va">self</span>.second <span class="op">=</span> second</span>
<span id="cb7-34"><a href="#cb7-34" aria-hidden="true"></a></span>
<span id="cb7-35"><a href="#cb7-35" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb7-36"><a href="#cb7-36" aria-hidden="true"></a>        count <span class="op">=</span> <span class="va">self</span>.amount</span>
<span id="cb7-37"><a href="#cb7-37" aria-hidden="true"></a></span>
<span id="cb7-38"><a href="#cb7-38" aria-hidden="true"></a>        <span class="kw">def</span> limited_consumer(element):</span>
<span id="cb7-39"><a href="#cb7-39" aria-hidden="true"></a>            <span class="kw">nonlocal</span> count</span>
<span id="cb7-40"><a href="#cb7-40" aria-hidden="true"></a>            <span class="co"># Note: if you change this to only decrement count when not</span></span>
<span id="cb7-41"><a href="#cb7-41" aria-hidden="true"></a>            <span class="co"># throwing, this works as expected.</span></span>
<span id="cb7-42"><a href="#cb7-42" aria-hidden="true"></a>            <span class="co">#</span></span>
<span id="cb7-43"><a href="#cb7-43" aria-hidden="true"></a>            <span class="co"># The point is, outer AppendAfter&#39;s exception is caught by the</span></span>
<span id="cb7-44"><a href="#cb7-44" aria-hidden="true"></a>            <span class="co"># inner AppendAfter, which then leaves inner AppendAfter in an</span></span>
<span id="cb7-45"><a href="#cb7-45" aria-hidden="true"></a>            <span class="co"># invalid state where count is negative.</span></span>
<span id="cb7-46"><a href="#cb7-46" aria-hidden="true"></a>            current <span class="op">=</span> count</span>
<span id="cb7-47"><a href="#cb7-47" aria-hidden="true"></a>            count <span class="op">-=</span> <span class="dv">1</span></span>
<span id="cb7-48"><a href="#cb7-48" aria-hidden="true"></a>            <span class="cf">if</span> current <span class="op">==</span> <span class="dv">0</span>:</span>
<span id="cb7-49"><a href="#cb7-49" aria-hidden="true"></a>                <span class="cf">raise</span> AppendAfterException()</span>
<span id="cb7-50"><a href="#cb7-50" aria-hidden="true"></a>            consumer(element)</span>
<span id="cb7-51"><a href="#cb7-51" aria-hidden="true"></a></span>
<span id="cb7-52"><a href="#cb7-52" aria-hidden="true"></a>        <span class="cf">try</span>:</span>
<span id="cb7-53"><a href="#cb7-53" aria-hidden="true"></a>            <span class="va">self</span>.first.for_each(limited_consumer)</span>
<span id="cb7-54"><a href="#cb7-54" aria-hidden="true"></a>        <span class="cf">except</span> AppendAfterException:</span>
<span id="cb7-55"><a href="#cb7-55" aria-hidden="true"></a>            <span class="cf">pass</span></span>
<span id="cb7-56"><a href="#cb7-56" aria-hidden="true"></a></span>
<span id="cb7-57"><a href="#cb7-57" aria-hidden="true"></a>        <span class="va">self</span>.second.for_each(consumer)</span>
<span id="cb7-58"><a href="#cb7-58" aria-hidden="true"></a></span>
<span id="cb7-59"><a href="#cb7-59" aria-hidden="true"></a></span>
<span id="cb7-60"><a href="#cb7-60" aria-hidden="true"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">&quot;__main__&quot;</span>:</span>
<span id="cb7-61"><a href="#cb7-61" aria-hidden="true"></a>    <span class="co"># Works:</span></span>
<span id="cb7-62"><a href="#cb7-62" aria-hidden="true"></a>    AppendAfter(CountFrom(<span class="dv">0</span>), <span class="dv">5</span>, Empty()).for_each(<span class="bu">print</span>)</span>
<span id="cb7-63"><a href="#cb7-63" aria-hidden="true"></a></span>
<span id="cb7-64"><a href="#cb7-64" aria-hidden="true"></a>    <span class="co"># Loops:</span></span>
<span id="cb7-65"><a href="#cb7-65" aria-hidden="true"></a>    AppendAfter(</span>
<span id="cb7-66"><a href="#cb7-66" aria-hidden="true"></a>        AppendAfter(CountFrom(<span class="dv">0</span>), <span class="dv">10</span>, CountFrom(<span class="dv">20</span>)),</span>
<span id="cb7-67"><a href="#cb7-67" aria-hidden="true"></a>        <span class="dv">5</span>,</span>
<span id="cb7-68"><a href="#cb7-68" aria-hidden="true"></a>        Empty()</span>
<span id="cb7-69"><a href="#cb7-69" aria-hidden="true"></a>    ).for_each(<span class="bu">print</span>)</span></code></pre></div>
</details>
<details>
<p><summary>Python implementation with the bug fixed</summary></p>
<div class="sourceCode" id="cb8"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a><span class="im">from</span> collections.abc <span class="im">import</span> Callable</span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true"></a></span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true"></a></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true"></a><span class="kw">class</span> AppendAfterException(<span class="pp">Exception</span>):</span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true"></a>    <span class="cf">pass</span></span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true"></a></span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true"></a></span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true"></a><span class="kw">class</span> Sequence:</span>
<span id="cb8-9"><a href="#cb8-9" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb8-10"><a href="#cb8-10" aria-hidden="true"></a>        <span class="cf">raise</span> <span class="pp">NotImplementedError</span></span>
<span id="cb8-11"><a href="#cb8-11" aria-hidden="true"></a></span>
<span id="cb8-12"><a href="#cb8-12" aria-hidden="true"></a></span>
<span id="cb8-13"><a href="#cb8-13" aria-hidden="true"></a><span class="kw">class</span> CountFrom(Sequence):</span>
<span id="cb8-14"><a href="#cb8-14" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, start: <span class="bu">int</span>):</span>
<span id="cb8-15"><a href="#cb8-15" aria-hidden="true"></a>        <span class="va">self</span>.start <span class="op">=</span> start</span>
<span id="cb8-16"><a href="#cb8-16" aria-hidden="true"></a></span>
<span id="cb8-17"><a href="#cb8-17" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb8-18"><a href="#cb8-18" aria-hidden="true"></a>        i <span class="op">=</span> <span class="va">self</span>.start</span>
<span id="cb8-19"><a href="#cb8-19" aria-hidden="true"></a>        <span class="cf">while</span> <span class="va">True</span>:</span>
<span id="cb8-20"><a href="#cb8-20" aria-hidden="true"></a>            consumer(i)</span>
<span id="cb8-21"><a href="#cb8-21" aria-hidden="true"></a>            i <span class="op">+=</span> <span class="dv">1</span></span>
<span id="cb8-22"><a href="#cb8-22" aria-hidden="true"></a></span>
<span id="cb8-23"><a href="#cb8-23" aria-hidden="true"></a></span>
<span id="cb8-24"><a href="#cb8-24" aria-hidden="true"></a><span class="kw">class</span> Empty(Sequence):</span>
<span id="cb8-25"><a href="#cb8-25" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb8-26"><a href="#cb8-26" aria-hidden="true"></a>        <span class="cf">pass</span></span>
<span id="cb8-27"><a href="#cb8-27" aria-hidden="true"></a></span>
<span id="cb8-28"><a href="#cb8-28" aria-hidden="true"></a></span>
<span id="cb8-29"><a href="#cb8-29" aria-hidden="true"></a><span class="kw">class</span> AppendAfter(Sequence):</span>
<span id="cb8-30"><a href="#cb8-30" aria-hidden="true"></a>    <span class="kw">def</span> <span class="fu">__init__</span>(<span class="va">self</span>, first: Sequence, amount: <span class="bu">int</span>, second: Sequence):</span>
<span id="cb8-31"><a href="#cb8-31" aria-hidden="true"></a>        <span class="va">self</span>.first <span class="op">=</span> first</span>
<span id="cb8-32"><a href="#cb8-32" aria-hidden="true"></a>        <span class="va">self</span>.amount <span class="op">=</span> amount</span>
<span id="cb8-33"><a href="#cb8-33" aria-hidden="true"></a>        <span class="va">self</span>.second <span class="op">=</span> second</span>
<span id="cb8-34"><a href="#cb8-34" aria-hidden="true"></a></span>
<span id="cb8-35"><a href="#cb8-35" aria-hidden="true"></a>    <span class="kw">def</span> for_each(<span class="va">self</span>, consumer: Callable) <span class="op">-&gt;</span> <span class="va">None</span>:</span>
<span id="cb8-36"><a href="#cb8-36" aria-hidden="true"></a>        count <span class="op">=</span> <span class="va">self</span>.amount</span>
<span id="cb8-37"><a href="#cb8-37" aria-hidden="true"></a>        sentinel <span class="op">=</span> AppendAfterException()</span>
<span id="cb8-38"><a href="#cb8-38" aria-hidden="true"></a></span>
<span id="cb8-39"><a href="#cb8-39" aria-hidden="true"></a>        <span class="kw">def</span> limited_consumer(element):</span>
<span id="cb8-40"><a href="#cb8-40" aria-hidden="true"></a>            <span class="kw">nonlocal</span> count</span>
<span id="cb8-41"><a href="#cb8-41" aria-hidden="true"></a>            current <span class="op">=</span> count</span>
<span id="cb8-42"><a href="#cb8-42" aria-hidden="true"></a>            count <span class="op">-=</span> <span class="dv">1</span></span>
<span id="cb8-43"><a href="#cb8-43" aria-hidden="true"></a>            <span class="cf">if</span> current <span class="op">==</span> <span class="dv">0</span>:</span>
<span id="cb8-44"><a href="#cb8-44" aria-hidden="true"></a>                <span class="cf">raise</span> sentinel</span>
<span id="cb8-45"><a href="#cb8-45" aria-hidden="true"></a>            consumer(element)</span>
<span id="cb8-46"><a href="#cb8-46" aria-hidden="true"></a></span>
<span id="cb8-47"><a href="#cb8-47" aria-hidden="true"></a>        <span class="cf">try</span>:</span>
<span id="cb8-48"><a href="#cb8-48" aria-hidden="true"></a>            <span class="va">self</span>.first.for_each(limited_consumer)</span>
<span id="cb8-49"><a href="#cb8-49" aria-hidden="true"></a>        <span class="cf">except</span> AppendAfterException <span class="im">as</span> e:</span>
<span id="cb8-50"><a href="#cb8-50" aria-hidden="true"></a>            <span class="cf">if</span> e <span class="kw">is</span> <span class="kw">not</span> sentinel:</span>
<span id="cb8-51"><a href="#cb8-51" aria-hidden="true"></a>                <span class="cf">raise</span></span>
<span id="cb8-52"><a href="#cb8-52" aria-hidden="true"></a></span>
<span id="cb8-53"><a href="#cb8-53" aria-hidden="true"></a>        <span class="va">self</span>.second.for_each(consumer)</span>
<span id="cb8-54"><a href="#cb8-54" aria-hidden="true"></a></span>
<span id="cb8-55"><a href="#cb8-55" aria-hidden="true"></a></span>
<span id="cb8-56"><a href="#cb8-56" aria-hidden="true"></a><span class="cf">if</span> <span class="va">__name__</span> <span class="op">==</span> <span class="st">&quot;__main__&quot;</span>:</span>
<span id="cb8-57"><a href="#cb8-57" aria-hidden="true"></a>    <span class="co"># Works:</span></span>
<span id="cb8-58"><a href="#cb8-58" aria-hidden="true"></a>    AppendAfter(CountFrom(<span class="dv">0</span>), <span class="dv">5</span>, Empty()).for_each(<span class="bu">print</span>)</span>
<span id="cb8-59"><a href="#cb8-59" aria-hidden="true"></a></span>
<span id="cb8-60"><a href="#cb8-60" aria-hidden="true"></a>    <span class="co"># Also works now:</span></span>
<span id="cb8-61"><a href="#cb8-61" aria-hidden="true"></a>    AppendAfter(</span>
<span id="cb8-62"><a href="#cb8-62" aria-hidden="true"></a>        AppendAfter(CountFrom(<span class="dv">0</span>), <span class="dv">10</span>, CountFrom(<span class="dv">20</span>)),</span>
<span id="cb8-63"><a href="#cb8-63" aria-hidden="true"></a>        <span class="dv">5</span>,</span>
<span id="cb8-64"><a href="#cb8-64" aria-hidden="true"></a>        Empty()</span>
<span id="cb8-65"><a href="#cb8-65" aria-hidden="true"></a>    ).for_each(<span class="bu">print</span>)</span></code></pre></div>
</details>
<p>If you want to experiment with this in other languages:</p>
<details>
<p><summary>Dart implementation</summary></p>
<pre class="dart"><code>abstract class Sequence&lt;Element&gt; {
  void forEach(void Function(Element) consumer);
}

class CountFrom implements Sequence&lt;int&gt; {
  final int from;

  CountFrom(this.from);

  @override
  void forEach(void Function(int) consumer) {
    for (int i = from; ; i += 1) {
      consumer(i);
    }
  }
}

class Empty implements Sequence&lt;int&gt; {
  @override
  void forEach(void Function(int) consumer) {}
}

class AppendAfter&lt;Element&gt; implements Sequence&lt;Element&gt; {
  final Sequence&lt;Element&gt; first;
  final Sequence&lt;Element&gt; second;
  final int amount;

  AppendAfter(this.first, this.amount, this.second);

  @override
  void forEach(void Function(Element) consumer) {
    try {
      int count = amount;
      first.forEach((element) {
        if (count-- == 0) {
          throw AppendAfterException();
        }
        consumer(element);
      });
    } on AppendAfterException {}
    second.forEach(consumer);
  }
}

class AppendAfterException {}

void main() {
  // final simple = AppendAfter(CountFrom(0), 5, Empty());
  // simple.forEach((i) =&gt; print(i));

  final complex = AppendAfter(AppendAfter(CountFrom(0), 10, CountFrom(20)), 5, Empty());
  complex.forEach((i) =&gt; print(i));
}</code></pre>
</details>
<details>
<p><summary>Fir implementation</summary></p>
<pre><code>trait Sequence[seq, t, exn]:
    forEach(self: seq, consumer: Fn(t) / exn) / exn

# ------------------------------------------------------------------------------

type CountFrom(from: U32)

impl Sequence[CountFrom, U32, exn]:
    forEach(self: CountFrom, consumer: Fn(U32) / exn) / exn:
        let i = self.from
        loop:
            consumer(i)
            i += 1

# ------------------------------------------------------------------------------

type AppendAfter[s1, s2](
    seq1: s1,
    seq2: s2,
    amt: U32,
)

type AppendAfterStop:
    AppendAfterStop

impl[Sequence[s1, t, [AppendAfterStop, ..exn]], Sequence[s2, t, [AppendAfterStop, ..exn]]]
        Sequence[AppendAfter[s1, s2], t, [AppendAfterStop, ..exn]]:
    forEach(
            self: AppendAfter[s1, s2],
            consumer: Fn(t) / [AppendAfterStop, ..exn]
        ) / [AppendAfterStop, ..exn]:
        match try(\():
            self.seq1.forEach(\(i: t) / [AppendAfterStop, ..exn]:
                let current = self.amt
                self.amt -= 1
                if current == 0:
                    throw(~AppendAfterStop.AppendAfterStop)
                consumer(i))):
            Result.Ok(()) | Result.Err(~AppendAfterStop.AppendAfterStop):
                self.seq2.forEach(consumer)

# ------------------------------------------------------------------------------

type EmptySeq:
    EmptySeq

impl Sequence[EmptySeq, t, exn]:
    forEach(self: EmptySeq, consumer: Fn(t) / exn) / exn:
        ()

# ------------------------------------------------------------------------------

main():
    let seq =
        AppendAfter(
            seq1 = AppendAfter(seq1 = CountFrom(from = 0), seq2 = CountFrom(from = 10), amt = 5),
            seq2 = EmptySeq.EmptySeq,
            amt = 5,
        )

    try[(), [AppendAfterStop], []](
        \(): seq.forEach(\(i: U32): print(i)))

    ()</code></pre>
</details>
<p>Fir implementation demonstrates that the issue is not a typing issue: it happens even with checked exceptions.</p>
<p>Note that in debug builds this Fir program will crash because of an underflow: the counter goes below 0 as explained above, but it’s not allowed to, as the counter type is unsigned. If you want it to loop, run in release mode.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>I’ve briefly looked into how exceptions work in SML as the original post mentions it a few times. In SML you can catch all exceptions, so you can intercept anything and it doesn’t fully implement Robert’s ideal exception semantics.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>
<entry>
    <title>Containing contagious types with OCaml modules</title>
    <link href="http://osa1.net/posts/2026-03-10-containing-contagious-types.html" />
    <id>http://osa1.net/posts/2026-03-10-containing-contagious-types.html</id>
    <published>2026-03-10T00:00:00Z</published>
    <updated>2026-03-10T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>In the <a href="https://osa1.net/posts/2026-03-07-extensible-named-types-fir.html">previous post</a> we looked at a way to extend product types with new fields and sum types with new constructors, using row types, in Fir.</p>
<p>A problem with the approach was that it required adding type parameters to the type being extended. In the cases where the extended type is a sum type and different constructors are extended with different fields, we may even need more than one type parameter. Those type parameters can then be propagated to the use sites, and their use sites, and their use sites…</p>
<p>I call these kinds of type parameters “contagious”, and it’s difficult to completely avoid them in Fir. In Fir, most function types are polymorphic in the exceptions they throw. This allows things like: calling a function that doesn’t throw in throwing contexts, or calling a function that throws <code>Error1</code> and another that throws <code>Error2</code> from the same function, and inferring the calling function’s exception type as <code>[Error1, Error2, ..exn]</code>. The way we achieve this polymorphism<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> is by having a type parameter representing the exceptions the function can throw<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>So I thought, maybe instead of avoiding type parameters, we should think about how we might contain, or hide them, and I started to look at existing features in other languages.</p>
<p>In this post we’re going to look at how OCaml modules might be used for avoiding multiple type parameters (one for each extension). It turns out OCaml modules provide a solution that’s <strong>almost</strong> right.</p>
<p>(Full OCaml code is at the end of this post.)</p>
<h1 id="the-setup">The setup</h1>
<p>We have lots of AST types for expressions, statements, declarations, … and we want to make them extensible with new fields and new constructors. Different AST types will be extended with different fields or constructors, and even in the same AST type (e.g. <code>Expr</code> in the original post) we may need different types of extensions for different constructors of the type.</p>
<p>To keep things simple, in this post we’ll only add new fields.</p>
<p>As the language, we’ll use the lambda calculus, with <code>let</code>s. Here’s how the AST could look like in OCaml:</p>
<div class="sourceCode" id="cb1"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true"></a><span class="kw">type</span> expr = Var <span class="kw">of</span> var | App <span class="kw">of</span> app | Abs <span class="kw">of</span> <span class="dt">abs</span> | Let <span class="kw">of</span> let_</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true"></a><span class="kw">and</span> var = { name : <span class="dt">string</span> }</span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true"></a><span class="kw">and</span> app = { fn : expr; arg : expr }</span>
<span id="cb1-4"><a href="#cb1-4" aria-hidden="true"></a><span class="kw">and</span> <span class="dt">abs</span> = { param : <span class="dt">string</span>; body : expr }</span>
<span id="cb1-5"><a href="#cb1-5" aria-hidden="true"></a><span class="kw">and</span> let_ = { bound : <span class="dt">string</span>; rhs : expr; body : expr }</span></code></pre></div>
<p>With extensions:</p>
<div class="sourceCode" id="cb2"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true"></a><span class="kw">type</span> (&#39;v, &#39;a, &#39;b, &#39;l) expr =</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true"></a>  | Var <span class="kw">of</span> &#39;v var</span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true"></a>  | App <span class="kw">of</span> (&#39;v, &#39;a, &#39;b, &#39;l) app</span>
<span id="cb2-4"><a href="#cb2-4" aria-hidden="true"></a>  | Abs <span class="kw">of</span> (&#39;v, &#39;a, &#39;b, &#39;l) <span class="dt">abs</span></span>
<span id="cb2-5"><a href="#cb2-5" aria-hidden="true"></a>  | Let <span class="kw">of</span> (&#39;v, &#39;a, &#39;b, &#39;l) let_</span>
<span id="cb2-6"><a href="#cb2-6" aria-hidden="true"></a></span>
<span id="cb2-7"><a href="#cb2-7" aria-hidden="true"></a><span class="kw">and</span> &#39;v var = { name : <span class="dt">string</span>; var_ext : &#39;v }</span>
<span id="cb2-8"><a href="#cb2-8" aria-hidden="true"></a></span>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true"></a><span class="kw">and</span> (&#39;v, &#39;a, &#39;b, &#39;l) app = {</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true"></a>  fn : (&#39;v, &#39;a, &#39;b, &#39;l) expr;</span>
<span id="cb2-11"><a href="#cb2-11" aria-hidden="true"></a>  arg : (&#39;v, &#39;a, &#39;b, &#39;l) expr;</span>
<span id="cb2-12"><a href="#cb2-12" aria-hidden="true"></a>  app_ext : &#39;a;</span>
<span id="cb2-13"><a href="#cb2-13" aria-hidden="true"></a>}</span>
<span id="cb2-14"><a href="#cb2-14" aria-hidden="true"></a></span>
<span id="cb2-15"><a href="#cb2-15" aria-hidden="true"></a><span class="kw">and</span> (&#39;v, &#39;a, &#39;b, &#39;l) <span class="dt">abs</span> = {</span>
<span id="cb2-16"><a href="#cb2-16" aria-hidden="true"></a>  param : <span class="dt">string</span>;</span>
<span id="cb2-17"><a href="#cb2-17" aria-hidden="true"></a>  body : (&#39;v, &#39;a, &#39;b, &#39;l) expr;</span>
<span id="cb2-18"><a href="#cb2-18" aria-hidden="true"></a>  abs_ext : &#39;b;</span>
<span id="cb2-19"><a href="#cb2-19" aria-hidden="true"></a>}</span>
<span id="cb2-20"><a href="#cb2-20" aria-hidden="true"></a></span>
<span id="cb2-21"><a href="#cb2-21" aria-hidden="true"></a><span class="kw">and</span> (&#39;v, &#39;a, &#39;b, &#39;l) let_ = {</span>
<span id="cb2-22"><a href="#cb2-22" aria-hidden="true"></a>  bound : <span class="dt">string</span>;</span>
<span id="cb2-23"><a href="#cb2-23" aria-hidden="true"></a>  rhs : (&#39;v, &#39;a, &#39;b, &#39;l) expr;</span>
<span id="cb2-24"><a href="#cb2-24" aria-hidden="true"></a>  body : (&#39;v, &#39;a, &#39;b, &#39;l) expr;</span>
<span id="cb2-25"><a href="#cb2-25" aria-hidden="true"></a>  let_ext : &#39;l;</span>
<span id="cb2-26"><a href="#cb2-26" aria-hidden="true"></a>}</span></code></pre></div>
<p>This is obviously unusable and it won’t scale with more types and constructors.</p>
<p>With modules, we can have a module signature with the AST types and abstract extension types, and implement it with different concrete types for the extension types.</p>
<p>We first define a module signature with the AST extensions:</p>
<div class="sourceCode" id="cb3"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true"></a><span class="kw">module</span> <span class="kw">type</span> AST_EXTENSIONS = <span class="kw">sig</span></span>
<span id="cb3-2"><a href="#cb3-2" aria-hidden="true"></a>  <span class="kw">type</span> var_ext</span>
<span id="cb3-3"><a href="#cb3-3" aria-hidden="true"></a>  <span class="kw">type</span> app_ext</span>
<span id="cb3-4"><a href="#cb3-4" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext</span>
<span id="cb3-5"><a href="#cb3-5" aria-hidden="true"></a>  <span class="kw">type</span> let_ext</span>
<span id="cb3-6"><a href="#cb3-6" aria-hidden="true"></a></span>
<span id="cb3-7"><a href="#cb3-7" aria-hidden="true"></a>  <span class="kw">val</span> default_var_ext : var_ext</span>
<span id="cb3-8"><a href="#cb3-8" aria-hidden="true"></a>  <span class="kw">val</span> default_app_ext : app_ext</span>
<span id="cb3-9"><a href="#cb3-9" aria-hidden="true"></a>  <span class="kw">val</span> default_abs_ext : abs_ext</span>
<span id="cb3-10"><a href="#cb3-10" aria-hidden="true"></a>  <span class="kw">val</span> default_let_ext : let_ext</span>
<span id="cb3-11"><a href="#cb3-11" aria-hidden="true"></a><span class="kw">end</span></span></code></pre></div>
<p>AST module signature then uses the extension types:</p>
<div class="sourceCode" id="cb4"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true"></a><span class="kw">module</span> <span class="kw">type</span> AST = <span class="kw">sig</span></span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true"></a>  <span class="kw">include</span> AST_EXTENSIONS</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true"></a></span>
<span id="cb4-4"><a href="#cb4-4" aria-hidden="true"></a>  <span class="kw">type</span> expr = Var <span class="kw">of</span> var | App <span class="kw">of</span> app | Abs <span class="kw">of</span> <span class="dt">abs</span> | Let <span class="kw">of</span> let_</span>
<span id="cb4-5"><a href="#cb4-5" aria-hidden="true"></a>  <span class="kw">and</span> var = { name : <span class="dt">string</span>; var_ext : var_ext }</span>
<span id="cb4-6"><a href="#cb4-6" aria-hidden="true"></a>  <span class="kw">and</span> app = { fn : expr; arg : expr; app_ext : app_ext }</span>
<span id="cb4-7"><a href="#cb4-7" aria-hidden="true"></a>  <span class="kw">and</span> <span class="dt">abs</span> = { param : <span class="dt">string</span>; body : expr; abs_ext : abs_ext }</span>
<span id="cb4-8"><a href="#cb4-8" aria-hidden="true"></a>  <span class="kw">and</span> let_ = { bound : <span class="dt">string</span>; rhs : expr; body : expr; let_ext : let_ext }</span>
<span id="cb4-9"><a href="#cb4-9" aria-hidden="true"></a><span class="kw">end</span></span></code></pre></div>
<p>We then use a functor to create new <code>AST</code> modules, with a given extension module:</p>
<div class="sourceCode" id="cb5"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true"></a><span class="kw">module</span> MakeAst (Ext : AST_EXTENSIONS) :</span>
<span id="cb5-2"><a href="#cb5-2" aria-hidden="true"></a>  AST</span>
<span id="cb5-3"><a href="#cb5-3" aria-hidden="true"></a>    <span class="kw">with</span> <span class="kw">type</span> var_ext = Ext.var_ext</span>
<span id="cb5-4"><a href="#cb5-4" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> app_ext = Ext.app_ext</span>
<span id="cb5-5"><a href="#cb5-5" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> abs_ext = Ext.abs_ext</span>
<span id="cb5-6"><a href="#cb5-6" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> let_ext = Ext.let_ext = <span class="kw">struct</span></span>
<span id="cb5-7"><a href="#cb5-7" aria-hidden="true"></a>  <span class="kw">type</span> var_ext = Ext.var_ext</span>
<span id="cb5-8"><a href="#cb5-8" aria-hidden="true"></a>  <span class="kw">type</span> app_ext = Ext.app_ext</span>
<span id="cb5-9"><a href="#cb5-9" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext = Ext.abs_ext</span>
<span id="cb5-10"><a href="#cb5-10" aria-hidden="true"></a>  <span class="kw">type</span> let_ext = Ext.let_ext</span>
<span id="cb5-11"><a href="#cb5-11" aria-hidden="true"></a></span>
<span id="cb5-12"><a href="#cb5-12" aria-hidden="true"></a>  <span class="kw">let</span> default_var_ext = Ext.default_var_ext</span>
<span id="cb5-13"><a href="#cb5-13" aria-hidden="true"></a>  <span class="kw">let</span> default_app_ext = Ext.default_app_ext</span>
<span id="cb5-14"><a href="#cb5-14" aria-hidden="true"></a>  <span class="kw">let</span> default_abs_ext = Ext.default_abs_ext</span>
<span id="cb5-15"><a href="#cb5-15" aria-hidden="true"></a>  <span class="kw">let</span> default_let_ext = Ext.default_let_ext</span>
<span id="cb5-16"><a href="#cb5-16" aria-hidden="true"></a></span>
<span id="cb5-17"><a href="#cb5-17" aria-hidden="true"></a>  <span class="kw">type</span> expr = Var <span class="kw">of</span> var | App <span class="kw">of</span> app | Abs <span class="kw">of</span> <span class="dt">abs</span> | Let <span class="kw">of</span> let_</span>
<span id="cb5-18"><a href="#cb5-18" aria-hidden="true"></a>  <span class="kw">and</span> var = { name : <span class="dt">string</span>; var_ext : Ext.var_ext }</span>
<span id="cb5-19"><a href="#cb5-19" aria-hidden="true"></a>  <span class="kw">and</span> app = { fn : expr; arg : expr; app_ext : app_ext }</span>
<span id="cb5-20"><a href="#cb5-20" aria-hidden="true"></a>  <span class="kw">and</span> <span class="dt">abs</span> = { param : <span class="dt">string</span>; body : expr; abs_ext : abs_ext }</span>
<span id="cb5-21"><a href="#cb5-21" aria-hidden="true"></a>  <span class="kw">and</span> let_ = { bound : <span class="dt">string</span>; rhs : expr; body : expr; let_ext : let_ext }</span>
<span id="cb5-22"><a href="#cb5-22" aria-hidden="true"></a><span class="kw">end</span></span></code></pre></div>
<p>In the first post we had two examples: a formatter that doesn’t need any extensions, and a type checker that needs to annotate AST nodes with inferred types. Here are the formatter’s and type checker’s AST modules:</p>
<div class="sourceCode" id="cb6"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true"></a><span class="kw">module</span> FmtAst = MakeAst (<span class="kw">struct</span></span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true"></a>  <span class="kw">type</span> var_ext = <span class="dt">unit</span></span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true"></a>  <span class="kw">type</span> app_ext = <span class="dt">unit</span></span>
<span id="cb6-4"><a href="#cb6-4" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext = <span class="dt">unit</span></span>
<span id="cb6-5"><a href="#cb6-5" aria-hidden="true"></a>  <span class="kw">type</span> let_ext = <span class="dt">unit</span></span>
<span id="cb6-6"><a href="#cb6-6" aria-hidden="true"></a></span>
<span id="cb6-7"><a href="#cb6-7" aria-hidden="true"></a>  <span class="kw">let</span> default_var_ext = ()</span>
<span id="cb6-8"><a href="#cb6-8" aria-hidden="true"></a>  <span class="kw">let</span> default_app_ext = ()</span>
<span id="cb6-9"><a href="#cb6-9" aria-hidden="true"></a>  <span class="kw">let</span> default_abs_ext = ()</span>
<span id="cb6-10"><a href="#cb6-10" aria-hidden="true"></a>  <span class="kw">let</span> default_let_ext = ()</span>
<span id="cb6-11"><a href="#cb6-11" aria-hidden="true"></a><span class="kw">end</span>)</span>
<span id="cb6-12"><a href="#cb6-12" aria-hidden="true"></a></span>
<span id="cb6-13"><a href="#cb6-13" aria-hidden="true"></a><span class="co">(* The type-checking type does not matter, just as a placeholder. *)</span></span>
<span id="cb6-14"><a href="#cb6-14" aria-hidden="true"></a><span class="kw">type</span> ty = TyVar <span class="kw">of</span> <span class="dt">string</span> | TyArrow <span class="kw">of</span> ty * ty</span>
<span id="cb6-15"><a href="#cb6-15" aria-hidden="true"></a></span>
<span id="cb6-16"><a href="#cb6-16" aria-hidden="true"></a><span class="co">(* Type-checking AST extensions. *)</span></span>
<span id="cb6-17"><a href="#cb6-17" aria-hidden="true"></a><span class="kw">type</span> tc_var_ext = { inferred_type : ty <span class="dt">option</span> }</span>
<span id="cb6-18"><a href="#cb6-18" aria-hidden="true"></a><span class="kw">type</span> tc_app_ext = { result_type : ty <span class="dt">option</span> }</span>
<span id="cb6-19"><a href="#cb6-19" aria-hidden="true"></a><span class="kw">type</span> tc_abs_ext = { param_type : ty <span class="dt">option</span> }</span>
<span id="cb6-20"><a href="#cb6-20" aria-hidden="true"></a><span class="kw">type</span> tc_let_ext = { bound_type : ty <span class="dt">option</span> }</span>
<span id="cb6-21"><a href="#cb6-21" aria-hidden="true"></a></span>
<span id="cb6-22"><a href="#cb6-22" aria-hidden="true"></a><span class="kw">module</span> TcAst = MakeAst (<span class="kw">struct</span></span>
<span id="cb6-23"><a href="#cb6-23" aria-hidden="true"></a>  <span class="kw">type</span> var_ext = tc_var_ext</span>
<span id="cb6-24"><a href="#cb6-24" aria-hidden="true"></a>  <span class="kw">type</span> app_ext = tc_app_ext</span>
<span id="cb6-25"><a href="#cb6-25" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext = tc_abs_ext</span>
<span id="cb6-26"><a href="#cb6-26" aria-hidden="true"></a>  <span class="kw">type</span> let_ext = tc_let_ext</span>
<span id="cb6-27"><a href="#cb6-27" aria-hidden="true"></a></span>
<span id="cb6-28"><a href="#cb6-28" aria-hidden="true"></a>  <span class="kw">let</span> default_var_ext = { inferred_type = <span class="dt">None</span> }</span>
<span id="cb6-29"><a href="#cb6-29" aria-hidden="true"></a>  <span class="kw">let</span> default_app_ext = { result_type = <span class="dt">None</span> }</span>
<span id="cb6-30"><a href="#cb6-30" aria-hidden="true"></a>  <span class="kw">let</span> default_abs_ext = { param_type = <span class="dt">None</span> }</span>
<span id="cb6-31"><a href="#cb6-31" aria-hidden="true"></a>  <span class="kw">let</span> default_let_ext = { bound_type = <span class="dt">None</span> }</span>
<span id="cb6-32"><a href="#cb6-32" aria-hidden="true"></a><span class="kw">end</span>)</span></code></pre></div>
<p>Now, the parser needs to be able to allocate different ASTs in different use sites, and so that’s where we need one type parameter (actually, a module parameter). As far as I understand, we can’t have functions parametric over modules, so we need a functor for generating a given module’s AST in the parser, using the <code>default_..._ext</code> functions in the AST module:</p>
<div class="sourceCode" id="cb7"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true"></a><span class="kw">module</span> Parse (A : AST) = <span class="kw">struct</span></span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true"></a>  <span class="co">(* Parsing entry point: tokenizes and parses. *)</span></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true"></a>  <span class="kw">let</span> parse (<span class="dt">input</span> : <span class="dt">string</span>) : A.expr =</span>
<span id="cb7-4"><a href="#cb7-4" aria-hidden="true"></a>    ...</span>
<span id="cb7-5"><a href="#cb7-5" aria-hidden="true"></a></span>
<span id="cb7-6"><a href="#cb7-6" aria-hidden="true"></a>  <span class="co">(* Parse a single expression from tokens. *)</span></span>
<span id="cb7-7"><a href="#cb7-7" aria-hidden="true"></a>  <span class="kw">let</span> <span class="kw">rec</span> parse_expr (toks : tokens) : A.expr * tokens =</span>
<span id="cb7-8"><a href="#cb7-8" aria-hidden="true"></a>    ...</span>
<span id="cb7-9"><a href="#cb7-9" aria-hidden="true"></a><span class="kw">end</span></span></code></pre></div>
<p>Similarly, any other function that’s polymorphic over AST types needs to be a part of a functor that takes an AST module as argument. As another example, here’s a function that counts the number of AST nodes:</p>
<div class="sourceCode" id="cb8"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true"></a><span class="kw">module</span> CountNodes (A : AST) = <span class="kw">struct</span></span>
<span id="cb8-2"><a href="#cb8-2" aria-hidden="true"></a>  <span class="kw">let</span> <span class="kw">rec</span> count (e : A.expr) : <span class="dt">int</span> =</span>
<span id="cb8-3"><a href="#cb8-3" aria-hidden="true"></a>    <span class="kw">match</span> e <span class="kw">with</span></span>
<span id="cb8-4"><a href="#cb8-4" aria-hidden="true"></a>    | Var _ -&gt; <span class="dv">1</span></span>
<span id="cb8-5"><a href="#cb8-5" aria-hidden="true"></a>    | App { fn; arg; _ } -&gt; <span class="dv">1</span> + count fn + count arg</span>
<span id="cb8-6"><a href="#cb8-6" aria-hidden="true"></a>    | Abs { body; _ } -&gt; <span class="dv">1</span> + count body</span>
<span id="cb8-7"><a href="#cb8-7" aria-hidden="true"></a>    | Let { rhs; body; _ } -&gt; <span class="dv">1</span> + count rhs + count body</span>
<span id="cb8-8"><a href="#cb8-8" aria-hidden="true"></a><span class="kw">end</span></span></code></pre></div>
<p>The final part of the ceremony is we apply these functors to get modules that we can then use to parse, format, and count nodes:</p>
<div class="sourceCode" id="cb9"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true"></a><span class="co">(* Parser module for the formatter. *)</span></span>
<span id="cb9-2"><a href="#cb9-2" aria-hidden="true"></a><span class="kw">module</span> FmtParse = Parse (FmtAst)</span>
<span id="cb9-3"><a href="#cb9-3" aria-hidden="true"></a></span>
<span id="cb9-4"><a href="#cb9-4" aria-hidden="true"></a><span class="co">(* Parser module for the type checker. *)</span></span>
<span id="cb9-5"><a href="#cb9-5" aria-hidden="true"></a><span class="kw">module</span> TcParse = Parse (TcAst)</span>
<span id="cb9-6"><a href="#cb9-6" aria-hidden="true"></a></span>
<span id="cb9-7"><a href="#cb9-7" aria-hidden="true"></a><span class="co">(* Node counter on the formatter&#39;s AST. *)</span></span>
<span id="cb9-8"><a href="#cb9-8" aria-hidden="true"></a><span class="kw">module</span> CountFmt = CountNodes (FmtAst)</span>
<span id="cb9-9"><a href="#cb9-9" aria-hidden="true"></a></span>
<span id="cb9-10"><a href="#cb9-10" aria-hidden="true"></a><span class="co">(* Node counter on the type checker&#39;s AST. *)</span></span>
<span id="cb9-11"><a href="#cb9-11" aria-hidden="true"></a><span class="kw">module</span> CountTc = CountNodes (TcAst)</span></code></pre></div>
<p>Type checker and formatter then refer to these modules:</p>
<div class="sourceCode" id="cb10"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true"></a><span class="kw">let</span> <span class="kw">rec</span> check_expr (e : TcAst.expr) : ty = ...</span>
<span id="cb10-2"><a href="#cb10-2" aria-hidden="true"></a><span class="kw">let</span> <span class="kw">rec</span> format_expr (e : FmtAst.expr) : <span class="dt">string</span> = ...</span></code></pre></div>
<h1 id="the-good">The good</h1>
<p>I can easily add per-AST functions, constants, or types and my parser or type checker code doesn’t become any worse. They always refer to the AST-specific things directly, and type signatures within the parser and type checker modules don’t get more complicated as we add more extensions.</p>
<h1 id="the-bad">The bad</h1>
<p>The entire AST type definitions need to be duplicated in the <code>AST</code> signature and <code>MakeAst</code> functor. Just this alone renders this feature useless for our purposes, as in any real programming language there will be a lot of AST types, and each type will be quite large too (with many fields and constructors).</p>
<p>There’s also a smaller-scale duplication in these lines:</p>
<div class="sourceCode" id="cb11"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true"></a><span class="kw">module</span> MakeAst (Ext : AST_EXTENSIONS) :</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true"></a>  AST</span>
<span id="cb11-3"><a href="#cb11-3" aria-hidden="true"></a>    <span class="kw">with</span> <span class="kw">type</span> var_ext = Ext.var_ext</span>
<span id="cb11-4"><a href="#cb11-4" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> app_ext = Ext.app_ext</span>
<span id="cb11-5"><a href="#cb11-5" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> abs_ext = Ext.abs_ext</span>
<span id="cb11-6"><a href="#cb11-6" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> let_ext = Ext.let_ext = <span class="kw">struct</span></span>
<span id="cb11-7"><a href="#cb11-7" aria-hidden="true"></a>  <span class="kw">type</span> var_ext = Ext.var_ext</span>
<span id="cb11-8"><a href="#cb11-8" aria-hidden="true"></a>  <span class="kw">type</span> app_ext = Ext.app_ext</span>
<span id="cb11-9"><a href="#cb11-9" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext = Ext.abs_ext</span>
<span id="cb11-10"><a href="#cb11-10" aria-hidden="true"></a>  <span class="kw">type</span> let_ext = Ext.let_ext</span>
<span id="cb11-11"><a href="#cb11-11" aria-hidden="true"></a>  ...</span>
<span id="cb11-12"><a href="#cb11-12" aria-hidden="true"></a><span class="kw">end</span></span></code></pre></div>
<p>My understanding is that the types in the <code>struct ... end</code> part are abstract, i.e. not visible outside of the module (similar to existentials), and the <code>: AST with type ...</code> part specifies the returned module signature, i.e. the public interface. They need to be in sync, but they also need to be specified separately.</p>
<p>The only solution I can think of to these duplications is generating code, but if I’m OK with generating code, that opens up a lot of possibilities, and I don’t need functors anymore. I could even generate the full modules with all the AST types and everything else directly, without using functors.</p>
<p>So in short, OCaml modules helps quite a bit, but they’re held back by the issues with code duplication.</p>
<hr />
<details>
<p><summary>Full code (tested with OCaml 5.3.0)</summary></p>
<div class="sourceCode" id="cb12"><pre class="sourceCode ocaml"><code class="sourceCode ocaml"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true"></a><span class="co">(* Tested with OCaml 5.3.0. *)</span></span>
<span id="cb12-2"><a href="#cb12-2" aria-hidden="true"></a></span>
<span id="cb12-3"><a href="#cb12-3" aria-hidden="true"></a><span class="kw">module</span> <span class="kw">type</span> AST_EXTENSIONS = <span class="kw">sig</span></span>
<span id="cb12-4"><a href="#cb12-4" aria-hidden="true"></a>  <span class="kw">type</span> var_ext</span>
<span id="cb12-5"><a href="#cb12-5" aria-hidden="true"></a>  <span class="kw">type</span> app_ext</span>
<span id="cb12-6"><a href="#cb12-6" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext</span>
<span id="cb12-7"><a href="#cb12-7" aria-hidden="true"></a>  <span class="kw">type</span> let_ext</span>
<span id="cb12-8"><a href="#cb12-8" aria-hidden="true"></a></span>
<span id="cb12-9"><a href="#cb12-9" aria-hidden="true"></a>  <span class="kw">val</span> default_var_ext : var_ext</span>
<span id="cb12-10"><a href="#cb12-10" aria-hidden="true"></a>  <span class="kw">val</span> default_app_ext : app_ext</span>
<span id="cb12-11"><a href="#cb12-11" aria-hidden="true"></a>  <span class="kw">val</span> default_abs_ext : abs_ext</span>
<span id="cb12-12"><a href="#cb12-12" aria-hidden="true"></a>  <span class="kw">val</span> default_let_ext : let_ext</span>
<span id="cb12-13"><a href="#cb12-13" aria-hidden="true"></a><span class="kw">end</span></span>
<span id="cb12-14"><a href="#cb12-14" aria-hidden="true"></a></span>
<span id="cb12-15"><a href="#cb12-15" aria-hidden="true"></a><span class="kw">module</span> <span class="kw">type</span> AST = <span class="kw">sig</span></span>
<span id="cb12-16"><a href="#cb12-16" aria-hidden="true"></a>  <span class="kw">include</span> AST_EXTENSIONS</span>
<span id="cb12-17"><a href="#cb12-17" aria-hidden="true"></a></span>
<span id="cb12-18"><a href="#cb12-18" aria-hidden="true"></a>  <span class="kw">type</span> expr = Var <span class="kw">of</span> var | App <span class="kw">of</span> app | Abs <span class="kw">of</span> <span class="dt">abs</span> | Let <span class="kw">of</span> let_</span>
<span id="cb12-19"><a href="#cb12-19" aria-hidden="true"></a>  <span class="kw">and</span> var = { name : <span class="dt">string</span>; var_ext : var_ext }</span>
<span id="cb12-20"><a href="#cb12-20" aria-hidden="true"></a>  <span class="kw">and</span> app = { fn : expr; arg : expr; app_ext : app_ext }</span>
<span id="cb12-21"><a href="#cb12-21" aria-hidden="true"></a>  <span class="kw">and</span> <span class="dt">abs</span> = { param : <span class="dt">string</span>; body : expr; abs_ext : abs_ext }</span>
<span id="cb12-22"><a href="#cb12-22" aria-hidden="true"></a>  <span class="kw">and</span> let_ = { bound : <span class="dt">string</span>; rhs : expr; body : expr; let_ext : let_ext }</span>
<span id="cb12-23"><a href="#cb12-23" aria-hidden="true"></a><span class="kw">end</span></span>
<span id="cb12-24"><a href="#cb12-24" aria-hidden="true"></a></span>
<span id="cb12-25"><a href="#cb12-25" aria-hidden="true"></a><span class="kw">module</span> MakeAst (Ext : AST_EXTENSIONS) :</span>
<span id="cb12-26"><a href="#cb12-26" aria-hidden="true"></a>  AST</span>
<span id="cb12-27"><a href="#cb12-27" aria-hidden="true"></a>    <span class="kw">with</span> <span class="kw">type</span> var_ext = Ext.var_ext</span>
<span id="cb12-28"><a href="#cb12-28" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> app_ext = Ext.app_ext</span>
<span id="cb12-29"><a href="#cb12-29" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> abs_ext = Ext.abs_ext</span>
<span id="cb12-30"><a href="#cb12-30" aria-hidden="true"></a>     <span class="kw">and</span> <span class="kw">type</span> let_ext = Ext.let_ext = <span class="kw">struct</span></span>
<span id="cb12-31"><a href="#cb12-31" aria-hidden="true"></a>  <span class="kw">type</span> var_ext = Ext.var_ext</span>
<span id="cb12-32"><a href="#cb12-32" aria-hidden="true"></a>  <span class="kw">type</span> app_ext = Ext.app_ext</span>
<span id="cb12-33"><a href="#cb12-33" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext = Ext.abs_ext</span>
<span id="cb12-34"><a href="#cb12-34" aria-hidden="true"></a>  <span class="kw">type</span> let_ext = Ext.let_ext</span>
<span id="cb12-35"><a href="#cb12-35" aria-hidden="true"></a></span>
<span id="cb12-36"><a href="#cb12-36" aria-hidden="true"></a>  <span class="kw">let</span> default_var_ext = Ext.default_var_ext</span>
<span id="cb12-37"><a href="#cb12-37" aria-hidden="true"></a>  <span class="kw">let</span> default_app_ext = Ext.default_app_ext</span>
<span id="cb12-38"><a href="#cb12-38" aria-hidden="true"></a>  <span class="kw">let</span> default_abs_ext = Ext.default_abs_ext</span>
<span id="cb12-39"><a href="#cb12-39" aria-hidden="true"></a>  <span class="kw">let</span> default_let_ext = Ext.default_let_ext</span>
<span id="cb12-40"><a href="#cb12-40" aria-hidden="true"></a></span>
<span id="cb12-41"><a href="#cb12-41" aria-hidden="true"></a>  <span class="kw">type</span> expr = Var <span class="kw">of</span> var | App <span class="kw">of</span> app | Abs <span class="kw">of</span> <span class="dt">abs</span> | Let <span class="kw">of</span> let_</span>
<span id="cb12-42"><a href="#cb12-42" aria-hidden="true"></a>  <span class="kw">and</span> var = { name : <span class="dt">string</span>; var_ext : Ext.var_ext }</span>
<span id="cb12-43"><a href="#cb12-43" aria-hidden="true"></a>  <span class="kw">and</span> app = { fn : expr; arg : expr; app_ext : app_ext }</span>
<span id="cb12-44"><a href="#cb12-44" aria-hidden="true"></a>  <span class="kw">and</span> <span class="dt">abs</span> = { param : <span class="dt">string</span>; body : expr; abs_ext : abs_ext }</span>
<span id="cb12-45"><a href="#cb12-45" aria-hidden="true"></a>  <span class="kw">and</span> let_ = { bound : <span class="dt">string</span>; rhs : expr; body : expr; let_ext : let_ext }</span>
<span id="cb12-46"><a href="#cb12-46" aria-hidden="true"></a><span class="kw">end</span></span>
<span id="cb12-47"><a href="#cb12-47" aria-hidden="true"></a></span>
<span id="cb12-48"><a href="#cb12-48" aria-hidden="true"></a><span class="co">(* --------------------------------------------------------</span></span>
<span id="cb12-49"><a href="#cb12-49" aria-hidden="true"></a><span class="co">   A simple recursive-descent parser, generic over any AST.</span></span>
<span id="cb12-50"><a href="#cb12-50" aria-hidden="true"></a></span>
<span id="cb12-51"><a href="#cb12-51" aria-hidden="true"></a><span class="co">   Grammar:</span></span>
<span id="cb12-52"><a href="#cb12-52" aria-hidden="true"></a><span class="co">     expr   ::= &#39;let&#39; IDENT &#39;=&#39; expr &#39;in&#39; expr</span></span>
<span id="cb12-53"><a href="#cb12-53" aria-hidden="true"></a><span class="co">              | &#39;\&#39; IDENT &#39;.&#39; expr</span></span>
<span id="cb12-54"><a href="#cb12-54" aria-hidden="true"></a><span class="co">              | app</span></span>
<span id="cb12-55"><a href="#cb12-55" aria-hidden="true"></a><span class="co">     app    ::= atom+</span></span>
<span id="cb12-56"><a href="#cb12-56" aria-hidden="true"></a><span class="co">     atom   ::= IDENT | &#39;(&#39; expr &#39;)&#39;</span></span>
<span id="cb12-57"><a href="#cb12-57" aria-hidden="true"></a><span class="co">   -------------------------------------------------------- *)</span></span>
<span id="cb12-58"><a href="#cb12-58" aria-hidden="true"></a><span class="kw">module</span> Parse (A : AST) = <span class="kw">struct</span></span>
<span id="cb12-59"><a href="#cb12-59" aria-hidden="true"></a>  <span class="kw">type</span> tokens = <span class="dt">string</span> <span class="dt">list</span></span>
<span id="cb12-60"><a href="#cb12-60" aria-hidden="true"></a></span>
<span id="cb12-61"><a href="#cb12-61" aria-hidden="true"></a>  <span class="co">(* parse_expr: top-level, handles let/lambda/application.</span></span>
<span id="cb12-62"><a href="#cb12-62" aria-hidden="true"></a><span class="co">     Lambda and let bodies extend as far right as possible</span></span>
<span id="cb12-63"><a href="#cb12-63" aria-hidden="true"></a><span class="co">     (i.e. parse_expr), so nested constructs work without parens:</span></span>
<span id="cb12-64"><a href="#cb12-64" aria-hidden="true"></a><span class="co">       let f = \x. \y. x in ...</span></span>
<span id="cb12-65"><a href="#cb12-65" aria-hidden="true"></a><span class="co">       \x. \y. x y</span></span>
<span id="cb12-66"><a href="#cb12-66" aria-hidden="true"></a><span class="co">     parse_app_args stops at &#39;in&#39;, &#39;)&#39;, and non-atom tokens,</span></span>
<span id="cb12-67"><a href="#cb12-67" aria-hidden="true"></a><span class="co">     so &#39;in&#39; correctly terminates a let-RHS that is an application. *)</span></span>
<span id="cb12-68"><a href="#cb12-68" aria-hidden="true"></a>  <span class="kw">let</span> <span class="kw">rec</span> parse_expr (toks : tokens) : A.expr * tokens =</span>
<span id="cb12-69"><a href="#cb12-69" aria-hidden="true"></a>    <span class="kw">match</span> toks <span class="kw">with</span></span>
<span id="cb12-70"><a href="#cb12-70" aria-hidden="true"></a>    | <span class="st">&quot;let&quot;</span> :: name :: <span class="st">&quot;=&quot;</span> :: rest -&gt; (</span>
<span id="cb12-71"><a href="#cb12-71" aria-hidden="true"></a>        <span class="kw">let</span> rhs, rest = parse_expr rest <span class="kw">in</span></span>
<span id="cb12-72"><a href="#cb12-72" aria-hidden="true"></a>        <span class="kw">match</span> rest <span class="kw">with</span></span>
<span id="cb12-73"><a href="#cb12-73" aria-hidden="true"></a>        | <span class="st">&quot;in&quot;</span> :: rest -&gt;</span>
<span id="cb12-74"><a href="#cb12-74" aria-hidden="true"></a>            <span class="kw">let</span> body, rest = parse_expr rest <span class="kw">in</span></span>
<span id="cb12-75"><a href="#cb12-75" aria-hidden="true"></a>            ( A.Let { bound = name; rhs; body; let_ext = A.default_let_ext },</span>
<span id="cb12-76"><a href="#cb12-76" aria-hidden="true"></a>              rest )</span>
<span id="cb12-77"><a href="#cb12-77" aria-hidden="true"></a>        | _ -&gt; <span class="dt">failwith</span> <span class="st">&quot;expected &#39;in&#39;&quot;</span>)</span>
<span id="cb12-78"><a href="#cb12-78" aria-hidden="true"></a>    | <span class="st">&quot;</span><span class="ch">\\</span><span class="st">&quot;</span> :: param :: <span class="st">&quot;.&quot;</span> :: rest -&gt;</span>
<span id="cb12-79"><a href="#cb12-79" aria-hidden="true"></a>        <span class="kw">let</span> body, rest = parse_expr rest <span class="kw">in</span></span>
<span id="cb12-80"><a href="#cb12-80" aria-hidden="true"></a>        (A.Abs { param; body; abs_ext = A.default_abs_ext }, rest)</span>
<span id="cb12-81"><a href="#cb12-81" aria-hidden="true"></a>    | _ -&gt; parse_app toks</span>
<span id="cb12-82"><a href="#cb12-82" aria-hidden="true"></a></span>
<span id="cb12-83"><a href="#cb12-83" aria-hidden="true"></a>  <span class="kw">and</span> parse_app (toks : tokens) : A.expr * tokens =</span>
<span id="cb12-84"><a href="#cb12-84" aria-hidden="true"></a>    <span class="kw">let</span> head, rest = parse_atom toks <span class="kw">in</span></span>
<span id="cb12-85"><a href="#cb12-85" aria-hidden="true"></a>    parse_app_args head rest</span>
<span id="cb12-86"><a href="#cb12-86" aria-hidden="true"></a></span>
<span id="cb12-87"><a href="#cb12-87" aria-hidden="true"></a>  <span class="kw">and</span> parse_app_args (fn : A.expr) (toks : tokens) : A.expr * tokens =</span>
<span id="cb12-88"><a href="#cb12-88" aria-hidden="true"></a>    <span class="kw">match</span> toks <span class="kw">with</span></span>
<span id="cb12-89"><a href="#cb12-89" aria-hidden="true"></a>    | [] | <span class="st">&quot;)&quot;</span> :: _ | <span class="st">&quot;in&quot;</span> :: _ -&gt; (fn, toks)</span>
<span id="cb12-90"><a href="#cb12-90" aria-hidden="true"></a>    | _ -&gt; (</span>
<span id="cb12-91"><a href="#cb12-91" aria-hidden="true"></a>        <span class="kw">match</span> parse_atom_opt toks <span class="kw">with</span></span>
<span id="cb12-92"><a href="#cb12-92" aria-hidden="true"></a>        | <span class="dt">Some</span> (arg, rest) -&gt;</span>
<span id="cb12-93"><a href="#cb12-93" aria-hidden="true"></a>            <span class="kw">let</span> node = A.App { fn; arg; app_ext = A.default_app_ext } <span class="kw">in</span></span>
<span id="cb12-94"><a href="#cb12-94" aria-hidden="true"></a>            parse_app_args node rest</span>
<span id="cb12-95"><a href="#cb12-95" aria-hidden="true"></a>        | <span class="dt">None</span> -&gt; (fn, toks))</span>
<span id="cb12-96"><a href="#cb12-96" aria-hidden="true"></a></span>
<span id="cb12-97"><a href="#cb12-97" aria-hidden="true"></a>  <span class="kw">and</span> parse_atom (toks : tokens) : A.expr * tokens =</span>
<span id="cb12-98"><a href="#cb12-98" aria-hidden="true"></a>    <span class="kw">match</span> parse_atom_opt toks <span class="kw">with</span></span>
<span id="cb12-99"><a href="#cb12-99" aria-hidden="true"></a>    | <span class="dt">Some</span> r -&gt; r</span>
<span id="cb12-100"><a href="#cb12-100" aria-hidden="true"></a>    | <span class="dt">None</span> -&gt;</span>
<span id="cb12-101"><a href="#cb12-101" aria-hidden="true"></a>        <span class="kw">let</span> tok = <span class="kw">match</span> toks <span class="kw">with</span> t :: _ -&gt; t | [] -&gt; <span class="st">&quot;EOF&quot;</span> <span class="kw">in</span></span>
<span id="cb12-102"><a href="#cb12-102" aria-hidden="true"></a>        <span class="dt">failwith</span> (<span class="dt">Printf</span>.sprintf <span class="st">&quot;expected atom, got &#39;%s&#39;&quot;</span> tok)</span>
<span id="cb12-103"><a href="#cb12-103" aria-hidden="true"></a></span>
<span id="cb12-104"><a href="#cb12-104" aria-hidden="true"></a>  <span class="kw">and</span> parse_atom_opt (toks : tokens) : (A.expr * tokens) <span class="dt">option</span> =</span>
<span id="cb12-105"><a href="#cb12-105" aria-hidden="true"></a>    <span class="kw">match</span> toks <span class="kw">with</span></span>
<span id="cb12-106"><a href="#cb12-106" aria-hidden="true"></a>    | <span class="st">&quot;(&quot;</span> :: rest -&gt; (</span>
<span id="cb12-107"><a href="#cb12-107" aria-hidden="true"></a>        <span class="kw">let</span> e, rest = parse_expr rest <span class="kw">in</span></span>
<span id="cb12-108"><a href="#cb12-108" aria-hidden="true"></a>        <span class="kw">match</span> rest <span class="kw">with</span></span>
<span id="cb12-109"><a href="#cb12-109" aria-hidden="true"></a>        | <span class="st">&quot;)&quot;</span> :: rest -&gt; <span class="dt">Some</span> (e, rest)</span>
<span id="cb12-110"><a href="#cb12-110" aria-hidden="true"></a>        | _ -&gt; <span class="dt">failwith</span> <span class="st">&quot;expected &#39;)&#39;&quot;</span>)</span>
<span id="cb12-111"><a href="#cb12-111" aria-hidden="true"></a>    | tok :: rest</span>
<span id="cb12-112"><a href="#cb12-112" aria-hidden="true"></a>      <span class="kw">when</span> tok &lt;&gt; <span class="st">&quot;let&quot;</span> &amp;&amp; tok &lt;&gt; <span class="st">&quot;</span><span class="ch">\\</span><span class="st">&quot;</span> &amp;&amp; tok &lt;&gt; <span class="st">&quot;in&quot;</span> &amp;&amp; tok &lt;&gt; <span class="st">&quot;=&quot;</span></span>
<span id="cb12-113"><a href="#cb12-113" aria-hidden="true"></a>           &amp;&amp; tok &lt;&gt; <span class="st">&quot;.&quot;</span> &amp;&amp; tok &lt;&gt; <span class="st">&quot;(&quot;</span> &amp;&amp; tok &lt;&gt; <span class="st">&quot;)&quot;</span> -&gt;</span>
<span id="cb12-114"><a href="#cb12-114" aria-hidden="true"></a>        <span class="dt">Some</span> (A.Var { name = tok; var_ext = A.default_var_ext }, rest)</span>
<span id="cb12-115"><a href="#cb12-115" aria-hidden="true"></a>    | _ -&gt; <span class="dt">None</span></span>
<span id="cb12-116"><a href="#cb12-116" aria-hidden="true"></a></span>
<span id="cb12-117"><a href="#cb12-117" aria-hidden="true"></a>  <span class="kw">let</span> parse (<span class="dt">input</span> : <span class="dt">string</span>) : A.expr =</span>
<span id="cb12-118"><a href="#cb12-118" aria-hidden="true"></a>    <span class="co">(* Tokenize: split on whitespace, treat parens as separate tokens *)</span></span>
<span id="cb12-119"><a href="#cb12-119" aria-hidden="true"></a>    <span class="kw">let</span> buf = <span class="dt">Buffer</span>.create (<span class="dt">String</span>.length <span class="dt">input</span>) <span class="kw">in</span></span>
<span id="cb12-120"><a href="#cb12-120" aria-hidden="true"></a>    <span class="dt">String</span>.iter</span>
<span id="cb12-121"><a href="#cb12-121" aria-hidden="true"></a>      (<span class="kw">fun</span> c -&gt;</span>
<span id="cb12-122"><a href="#cb12-122" aria-hidden="true"></a>        <span class="kw">match</span> c <span class="kw">with</span></span>
<span id="cb12-123"><a href="#cb12-123" aria-hidden="true"></a>        | <span class="ch">&#39;(&#39;</span> | <span class="ch">&#39;)&#39;</span> | <span class="ch">&#39;.&#39;</span> | <span class="ch">&#39;\\&#39;</span> -&gt;</span>
<span id="cb12-124"><a href="#cb12-124" aria-hidden="true"></a>            <span class="dt">Buffer</span>.add_char buf <span class="ch">&#39; &#39;</span>;</span>
<span id="cb12-125"><a href="#cb12-125" aria-hidden="true"></a>            <span class="dt">Buffer</span>.add_char buf c;</span>
<span id="cb12-126"><a href="#cb12-126" aria-hidden="true"></a>            <span class="dt">Buffer</span>.add_char buf <span class="ch">&#39; &#39;</span></span>
<span id="cb12-127"><a href="#cb12-127" aria-hidden="true"></a>        | _ -&gt; <span class="dt">Buffer</span>.add_char buf c)</span>
<span id="cb12-128"><a href="#cb12-128" aria-hidden="true"></a>      <span class="dt">input</span>;</span>
<span id="cb12-129"><a href="#cb12-129" aria-hidden="true"></a>    <span class="kw">let</span> s = <span class="dt">Buffer</span>.contents buf <span class="kw">in</span></span>
<span id="cb12-130"><a href="#cb12-130" aria-hidden="true"></a>    <span class="kw">let</span> toks = <span class="dt">String</span>.split_on_char <span class="ch">&#39; &#39;</span> s |&gt; <span class="dt">List</span>.filter (<span class="kw">fun</span> s -&gt; s &lt;&gt; <span class="st">&quot;&quot;</span>) <span class="kw">in</span></span>
<span id="cb12-131"><a href="#cb12-131" aria-hidden="true"></a>    <span class="kw">let</span> expr, rest = parse_expr toks <span class="kw">in</span></span>
<span id="cb12-132"><a href="#cb12-132" aria-hidden="true"></a>    <span class="kw">if</span> rest &lt;&gt; [] <span class="kw">then</span></span>
<span id="cb12-133"><a href="#cb12-133" aria-hidden="true"></a>      <span class="dt">failwith</span> (<span class="dt">Printf</span>.sprintf <span class="st">&quot;unexpected token &#39;%s&#39;&quot;</span> (<span class="dt">List</span>.hd rest));</span>
<span id="cb12-134"><a href="#cb12-134" aria-hidden="true"></a>    expr</span>
<span id="cb12-135"><a href="#cb12-135" aria-hidden="true"></a><span class="kw">end</span></span>
<span id="cb12-136"><a href="#cb12-136" aria-hidden="true"></a></span>
<span id="cb12-137"><a href="#cb12-137" aria-hidden="true"></a><span class="co">(* --------------------------------------------------------</span></span>
<span id="cb12-138"><a href="#cb12-138" aria-hidden="true"></a><span class="co">   Formatter — all extensions are unit.</span></span>
<span id="cb12-139"><a href="#cb12-139" aria-hidden="true"></a><span class="co">   -------------------------------------------------------- *)</span></span>
<span id="cb12-140"><a href="#cb12-140" aria-hidden="true"></a><span class="kw">module</span> FmtAst = MakeAst (<span class="kw">struct</span></span>
<span id="cb12-141"><a href="#cb12-141" aria-hidden="true"></a>  <span class="kw">type</span> var_ext = <span class="dt">unit</span></span>
<span id="cb12-142"><a href="#cb12-142" aria-hidden="true"></a>  <span class="kw">type</span> app_ext = <span class="dt">unit</span></span>
<span id="cb12-143"><a href="#cb12-143" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext = <span class="dt">unit</span></span>
<span id="cb12-144"><a href="#cb12-144" aria-hidden="true"></a>  <span class="kw">type</span> let_ext = <span class="dt">unit</span></span>
<span id="cb12-145"><a href="#cb12-145" aria-hidden="true"></a></span>
<span id="cb12-146"><a href="#cb12-146" aria-hidden="true"></a>  <span class="kw">let</span> default_var_ext = ()</span>
<span id="cb12-147"><a href="#cb12-147" aria-hidden="true"></a>  <span class="kw">let</span> default_app_ext = ()</span>
<span id="cb12-148"><a href="#cb12-148" aria-hidden="true"></a>  <span class="kw">let</span> default_abs_ext = ()</span>
<span id="cb12-149"><a href="#cb12-149" aria-hidden="true"></a>  <span class="kw">let</span> default_let_ext = ()</span>
<span id="cb12-150"><a href="#cb12-150" aria-hidden="true"></a><span class="kw">end</span>)</span>
<span id="cb12-151"><a href="#cb12-151" aria-hidden="true"></a></span>
<span id="cb12-152"><a href="#cb12-152" aria-hidden="true"></a><span class="kw">module</span> FmtParse = Parse (FmtAst)</span>
<span id="cb12-153"><a href="#cb12-153" aria-hidden="true"></a></span>
<span id="cb12-154"><a href="#cb12-154" aria-hidden="true"></a><span class="kw">let</span> <span class="kw">rec</span> format_expr (e : FmtAst.expr) : <span class="dt">string</span> =</span>
<span id="cb12-155"><a href="#cb12-155" aria-hidden="true"></a>  <span class="kw">match</span> e <span class="kw">with</span></span>
<span id="cb12-156"><a href="#cb12-156" aria-hidden="true"></a>  | Var { name; _ } -&gt; name</span>
<span id="cb12-157"><a href="#cb12-157" aria-hidden="true"></a>  | App { fn; arg; _ } -&gt;</span>
<span id="cb12-158"><a href="#cb12-158" aria-hidden="true"></a>      <span class="dt">Printf</span>.sprintf <span class="st">&quot;(%s %s)&quot;</span> (format_expr fn) (format_arg arg)</span>
<span id="cb12-159"><a href="#cb12-159" aria-hidden="true"></a>  | Abs { param; body; _ } -&gt;</span>
<span id="cb12-160"><a href="#cb12-160" aria-hidden="true"></a>      <span class="dt">Printf</span>.sprintf <span class="st">&quot;(</span><span class="ch">\\</span><span class="st">%s. %s)&quot;</span> param (format_expr body)</span>
<span id="cb12-161"><a href="#cb12-161" aria-hidden="true"></a>  | Let { bound; rhs; body; _ } -&gt;</span>
<span id="cb12-162"><a href="#cb12-162" aria-hidden="true"></a>      <span class="dt">Printf</span>.sprintf <span class="st">&quot;(let %s = %s in %s)&quot;</span> bound (format_expr rhs)</span>
<span id="cb12-163"><a href="#cb12-163" aria-hidden="true"></a>        (format_expr body)</span>
<span id="cb12-164"><a href="#cb12-164" aria-hidden="true"></a></span>
<span id="cb12-165"><a href="#cb12-165" aria-hidden="true"></a><span class="kw">and</span> format_arg (e : FmtAst.expr) : <span class="dt">string</span> =</span>
<span id="cb12-166"><a href="#cb12-166" aria-hidden="true"></a>  <span class="kw">match</span> e <span class="kw">with</span></span>
<span id="cb12-167"><a href="#cb12-167" aria-hidden="true"></a>  | Var { name; _ } -&gt; name</span>
<span id="cb12-168"><a href="#cb12-168" aria-hidden="true"></a>  | _ -&gt; <span class="dt">Printf</span>.sprintf <span class="st">&quot;(%s)&quot;</span> (format_expr e)</span>
<span id="cb12-169"><a href="#cb12-169" aria-hidden="true"></a></span>
<span id="cb12-170"><a href="#cb12-170" aria-hidden="true"></a><span class="co">(* --------------------------------------------------------</span></span>
<span id="cb12-171"><a href="#cb12-171" aria-hidden="true"></a><span class="co">   Type checker — extensions carry inferred types.</span></span>
<span id="cb12-172"><a href="#cb12-172" aria-hidden="true"></a><span class="co">   -------------------------------------------------------- *)</span></span>
<span id="cb12-173"><a href="#cb12-173" aria-hidden="true"></a><span class="kw">type</span> ty = TyVar <span class="kw">of</span> <span class="dt">string</span> | TyArrow <span class="kw">of</span> ty * ty</span>
<span id="cb12-174"><a href="#cb12-174" aria-hidden="true"></a><span class="kw">type</span> tc_var_ext = { inferred_type : ty <span class="dt">option</span> }</span>
<span id="cb12-175"><a href="#cb12-175" aria-hidden="true"></a><span class="kw">type</span> tc_app_ext = { result_type : ty <span class="dt">option</span> }</span>
<span id="cb12-176"><a href="#cb12-176" aria-hidden="true"></a><span class="kw">type</span> tc_abs_ext = { param_type : ty <span class="dt">option</span> }</span>
<span id="cb12-177"><a href="#cb12-177" aria-hidden="true"></a><span class="kw">type</span> tc_let_ext = { bound_type : ty <span class="dt">option</span> }</span>
<span id="cb12-178"><a href="#cb12-178" aria-hidden="true"></a></span>
<span id="cb12-179"><a href="#cb12-179" aria-hidden="true"></a><span class="kw">module</span> TcAst = MakeAst (<span class="kw">struct</span></span>
<span id="cb12-180"><a href="#cb12-180" aria-hidden="true"></a>  <span class="kw">type</span> var_ext = tc_var_ext</span>
<span id="cb12-181"><a href="#cb12-181" aria-hidden="true"></a>  <span class="kw">type</span> app_ext = tc_app_ext</span>
<span id="cb12-182"><a href="#cb12-182" aria-hidden="true"></a>  <span class="kw">type</span> abs_ext = tc_abs_ext</span>
<span id="cb12-183"><a href="#cb12-183" aria-hidden="true"></a>  <span class="kw">type</span> let_ext = tc_let_ext</span>
<span id="cb12-184"><a href="#cb12-184" aria-hidden="true"></a></span>
<span id="cb12-185"><a href="#cb12-185" aria-hidden="true"></a>  <span class="kw">let</span> default_var_ext = { inferred_type = <span class="dt">None</span> }</span>
<span id="cb12-186"><a href="#cb12-186" aria-hidden="true"></a>  <span class="kw">let</span> default_app_ext = { result_type = <span class="dt">None</span> }</span>
<span id="cb12-187"><a href="#cb12-187" aria-hidden="true"></a>  <span class="kw">let</span> default_abs_ext = { param_type = <span class="dt">None</span> }</span>
<span id="cb12-188"><a href="#cb12-188" aria-hidden="true"></a>  <span class="kw">let</span> default_let_ext = { bound_type = <span class="dt">None</span> }</span>
<span id="cb12-189"><a href="#cb12-189" aria-hidden="true"></a><span class="kw">end</span>)</span>
<span id="cb12-190"><a href="#cb12-190" aria-hidden="true"></a></span>
<span id="cb12-191"><a href="#cb12-191" aria-hidden="true"></a><span class="kw">module</span> TcParse = Parse (TcAst)</span>
<span id="cb12-192"><a href="#cb12-192" aria-hidden="true"></a></span>
<span id="cb12-193"><a href="#cb12-193" aria-hidden="true"></a><span class="kw">let</span> <span class="kw">rec</span> format_ty (t : ty) : <span class="dt">string</span> =</span>
<span id="cb12-194"><a href="#cb12-194" aria-hidden="true"></a>  <span class="kw">match</span> t <span class="kw">with</span></span>
<span id="cb12-195"><a href="#cb12-195" aria-hidden="true"></a>  | TyVar s -&gt; s</span>
<span id="cb12-196"><a href="#cb12-196" aria-hidden="true"></a>  | TyArrow ((TyArrow _ <span class="kw">as</span> a), b) -&gt;</span>
<span id="cb12-197"><a href="#cb12-197" aria-hidden="true"></a>      <span class="dt">Printf</span>.sprintf <span class="st">&quot;(%s) -&gt; %s&quot;</span> (format_ty a) (format_ty b)</span>
<span id="cb12-198"><a href="#cb12-198" aria-hidden="true"></a>  | TyArrow (a, b) -&gt; <span class="dt">Printf</span>.sprintf <span class="st">&quot;%s -&gt; %s&quot;</span> (format_ty a) (format_ty b)</span>
<span id="cb12-199"><a href="#cb12-199" aria-hidden="true"></a></span>
<span id="cb12-200"><a href="#cb12-200" aria-hidden="true"></a><span class="co">(* Placeholder: just read off the extension annotation if present. *)</span></span>
<span id="cb12-201"><a href="#cb12-201" aria-hidden="true"></a><span class="kw">let</span> <span class="kw">rec</span> check_expr (e : TcAst.expr) : ty =</span>
<span id="cb12-202"><a href="#cb12-202" aria-hidden="true"></a>  <span class="kw">match</span> e <span class="kw">with</span></span>
<span id="cb12-203"><a href="#cb12-203" aria-hidden="true"></a>  | Var { var_ext = { inferred_type = <span class="dt">Some</span> t }; _ } -&gt; t</span>
<span id="cb12-204"><a href="#cb12-204" aria-hidden="true"></a>  | Var { name; _ } -&gt; TyVar name</span>
<span id="cb12-205"><a href="#cb12-205" aria-hidden="true"></a>  | App { app_ext = { result_type = <span class="dt">Some</span> t }; _ } -&gt; t</span>
<span id="cb12-206"><a href="#cb12-206" aria-hidden="true"></a>  | App { fn; _ } -&gt; (</span>
<span id="cb12-207"><a href="#cb12-207" aria-hidden="true"></a>      <span class="kw">match</span> check_expr fn <span class="kw">with</span> TyArrow (_, ret) -&gt; ret | t -&gt; t)</span>
<span id="cb12-208"><a href="#cb12-208" aria-hidden="true"></a>  | Abs { param; body; abs_ext = { param_type }; _ } -&gt;</span>
<span id="cb12-209"><a href="#cb12-209" aria-hidden="true"></a>      <span class="kw">let</span> p = <span class="kw">match</span> param_type <span class="kw">with</span> <span class="dt">Some</span> t -&gt; t | <span class="dt">None</span> -&gt; TyVar param <span class="kw">in</span></span>
<span id="cb12-210"><a href="#cb12-210" aria-hidden="true"></a>      TyArrow (p, check_expr body)</span>
<span id="cb12-211"><a href="#cb12-211" aria-hidden="true"></a>  | Let { body; _ } -&gt; check_expr body</span>
<span id="cb12-212"><a href="#cb12-212" aria-hidden="true"></a></span>
<span id="cb12-213"><a href="#cb12-213" aria-hidden="true"></a><span class="co">(* --------------------------------------------------------</span></span>
<span id="cb12-214"><a href="#cb12-214" aria-hidden="true"></a><span class="co">   Generic node counter — works on any AST.</span></span>
<span id="cb12-215"><a href="#cb12-215" aria-hidden="true"></a><span class="co">   -------------------------------------------------------- *)</span></span>
<span id="cb12-216"><a href="#cb12-216" aria-hidden="true"></a><span class="kw">module</span> CountNodes (A : AST) = <span class="kw">struct</span></span>
<span id="cb12-217"><a href="#cb12-217" aria-hidden="true"></a>  <span class="kw">let</span> <span class="kw">rec</span> count (e : A.expr) : <span class="dt">int</span> =</span>
<span id="cb12-218"><a href="#cb12-218" aria-hidden="true"></a>    <span class="kw">match</span> e <span class="kw">with</span></span>
<span id="cb12-219"><a href="#cb12-219" aria-hidden="true"></a>    | Var _ -&gt; <span class="dv">1</span></span>
<span id="cb12-220"><a href="#cb12-220" aria-hidden="true"></a>    | App { fn; arg; _ } -&gt; <span class="dv">1</span> + count fn + count arg</span>
<span id="cb12-221"><a href="#cb12-221" aria-hidden="true"></a>    | Abs { body; _ } -&gt; <span class="dv">1</span> + count body</span>
<span id="cb12-222"><a href="#cb12-222" aria-hidden="true"></a>    | Let { rhs; body; _ } -&gt; <span class="dv">1</span> + count rhs + count body</span>
<span id="cb12-223"><a href="#cb12-223" aria-hidden="true"></a><span class="kw">end</span></span>
<span id="cb12-224"><a href="#cb12-224" aria-hidden="true"></a></span>
<span id="cb12-225"><a href="#cb12-225" aria-hidden="true"></a><span class="kw">module</span> CountFmt = CountNodes (FmtAst)</span>
<span id="cb12-226"><a href="#cb12-226" aria-hidden="true"></a><span class="kw">module</span> CountTc = CountNodes (TcAst)</span>
<span id="cb12-227"><a href="#cb12-227" aria-hidden="true"></a></span>
<span id="cb12-228"><a href="#cb12-228" aria-hidden="true"></a><span class="co">(* --------------------------------------------------------</span></span>
<span id="cb12-229"><a href="#cb12-229" aria-hidden="true"></a><span class="co">   Demo: parse the same source in both worlds.</span></span>
<span id="cb12-230"><a href="#cb12-230" aria-hidden="true"></a><span class="co">   -------------------------------------------------------- *)</span></span>
<span id="cb12-231"><a href="#cb12-231" aria-hidden="true"></a><span class="kw">let</span> source = {|<span class="kw">let</span> id = \x. x <span class="kw">in</span> id <span class="dv">42</span>|}</span>
<span id="cb12-232"><a href="#cb12-232" aria-hidden="true"></a></span>
<span id="cb12-233"><a href="#cb12-233" aria-hidden="true"></a><span class="kw">let</span> () =</span>
<span id="cb12-234"><a href="#cb12-234" aria-hidden="true"></a>  <span class="co">(* Formatter world — parse and pretty-print *)</span></span>
<span id="cb12-235"><a href="#cb12-235" aria-hidden="true"></a>  <span class="kw">let</span> prog = FmtParse.parse source <span class="kw">in</span></span>
<span id="cb12-236"><a href="#cb12-236" aria-hidden="true"></a>  <span class="dt">Printf</span>.printf <span class="st">&quot;formatted: %s</span><span class="ch">\n</span><span class="st">&quot;</span> (format_expr prog);</span>
<span id="cb12-237"><a href="#cb12-237" aria-hidden="true"></a>  <span class="dt">Printf</span>.printf <span class="st">&quot;node count: %d</span><span class="ch">\n</span><span class="st">&quot;</span> (CountFmt.count prog);</span>
<span id="cb12-238"><a href="#cb12-238" aria-hidden="true"></a></span>
<span id="cb12-239"><a href="#cb12-239" aria-hidden="true"></a>  <span class="co">(* Type checker world — parse (extensions default to None),</span></span>
<span id="cb12-240"><a href="#cb12-240" aria-hidden="true"></a><span class="co">     then check with the placeholder checker *)</span></span>
<span id="cb12-241"><a href="#cb12-241" aria-hidden="true"></a>  <span class="kw">let</span> tc_prog = TcParse.parse source <span class="kw">in</span></span>
<span id="cb12-242"><a href="#cb12-242" aria-hidden="true"></a>  <span class="dt">Printf</span>.printf <span class="st">&quot;inferred type: %s</span><span class="ch">\n</span><span class="st">&quot;</span> (format_ty (check_expr tc_prog));</span>
<span id="cb12-243"><a href="#cb12-243" aria-hidden="true"></a>  <span class="dt">Printf</span>.printf <span class="st">&quot;node count: %d</span><span class="ch">\n</span><span class="st">&quot;</span> (CountTc.count tc_prog)</span></code></pre></div>
</details>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>Actually, any kind of polymorphism. Fir currently doesn’t have trait objects and the only way to have polymorphism is by using type parameters, potentially with qualifications.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>This is a little bit simplified, see <a href="https://osa1.net/posts/2025-01-18-fir-error-handling.html">this post</a> for more details and examples.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>
<entry>
    <title>Extensible named types in Fir</title>
    <link href="http://osa1.net/posts/2026-03-07-extensible-named-types-fir.html" />
    <id>http://osa1.net/posts/2026-03-07-extensible-named-types-fir.html</id>
    <published>2026-03-07T00:00:00Z</published>
    <updated>2026-03-07T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>The front-end AST types are one of the most important types in a language implementation, and if we get them wrong nothing will be right in the rest of the implementation.</p>
<p>These types should be cheap to allocate and efficient to use, but also extensible, as different tools will use them differently. A type checker may want to add inferred types to expressions, but for a formatter, those inferred type fields would be a waste of memory.</p>
<p>One approach to this problem is to have a parser that generates parse events instead of an AST or CST, and let the tools have their own ASTs. I explored this in <a href="https://osa1.net/posts/2024-11-22-how-to-parse-1.html">a previous blog post</a>.</p>
<p>This approach works fine when the language is small, but for a programming language that’s never the case. Fir is currently quite simple, yet it has 28 types of expressions. Most production languages have many more.</p>
<p>So I’ve been thinking about making Fir’s AST types extensible with new fields in the self-hosted compiler. This AST will be used by many of the tools listed <a href="https://github.com/fir-lang/fir/issues/28">here</a>, and more. The parser and the AST types will be published as libraries.</p>
<p>There are a few common ways to add new fields to an existing type:</p>
<ul>
<li><p>With subtyping of nominal types (common in OOP languages), we can create a subtype with extra fields.</p></li>
<li><p>In languages where objects have identities (again, common in OOP languages), we can use an identity map to map objects to extra information.</p></li>
<li><p>If the objects don’t have identities, we can manually generate unique identities for objects that we want to attach extra information to, and then use a map, like in the previous option.</p></li>
</ul>
<p>(3) can be done in Fir, and it has a few advantages compared to extending existing types: <a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
<ul>
<li><p>The maps we use to attach extra information to AST nodes can be deallocated separately from the AST types. So if we have a long computation where we need some information in some of the steps but not later, we can allocate the maps and the deallocate while keeping the AST nodes alive.</p></li>
<li><p>We can create differently typed identities for different AST types, and generate the identities as consecutive numbers. Then use arrays instead of hash maps to map nodes to things.</p></li>
<li><p>Unlike built-in identities, we can choose the identity size (e.g. 32-bit numbers instead of 64-bit), and embed information about the values in the identities.</p></li>
</ul>
<p>I think this is probably the way to go in Fir’s self-hosted compiler, at least in the short term.</p>
<p>However while thinking about this I also found another way to extend types with more information, with row types.</p>
<h1 id="row-types-in-fir-today">Row types in Fir today</h1>
<p>Row types are mainly used for variants, which are the types that make exception handling in Fir <a href="https://osa1.net/posts/2025-01-18-fir-error-handling.html">safe, expressive, and convenient to use</a>.</p>
<p>A variant is just a set of types, e.g. <code>[U32, Str]</code> is a variant type with <code>U32</code> (32-bit unsigned integer) and <code>Str</code> (immutable, UTF-8 encoded unicode strings). Values of this type can be <code>U32</code>s or <code>Str</code>s.</p>
<p>The type <code>[U32, Str, ..r]</code> is the same as before, but it can have more types in it. When pattern matching a value of this type, we have to have a catch-all case handling the <code>..r</code> part, which represents extra types that the value may have.</p>
<p>To construct a variant value we just add a <code>~</code> prefix, e.g. <code>~123</code> gets the type <code>[U32, ..r]</code> (with a fresh <code>r</code>).</p>
<p>A crucial feature of variants in Fir is that they allow type refinement when pattern matching. If I have a variant value with type <code>[Bool, Str, ..r]</code>, and handle the <code>Bool</code>s in a pattern match and bind the rest to a variable, the variable gets a refined type:</p>
<pre><code>handleBools(arg: [Bool, Str, ..r]) [Str, ..r]:
    match arg:
        ~Bool.True: ~&quot;True&quot;
        ~Bool.False: ~&quot;False&quot;
        other: other</code></pre>
<p>Here the type of <code>other</code> is refined as <code>[Str, ..r]</code>, because the previous alternative of the <code>match</code> handles the <code>Bool</code> values, so at this point we know that the value can’t be a <code>Bool</code>. <a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a></p>
<p>When variants are used as checked exceptions, this allows things like: catching some of the exceptions thrown by a function and propagating the rest. See the link at the beginning of this section for more examples.</p>
<p>Now, these row types that represent “extra stuff” can also be used in records, and Fir supports that too. For example, the function below can take any record that has at least <code>x: U32</code> and <code>y: U32</code> fields:</p>
<pre><code>printXY(record: (x: U32, y: U32, ..r)):
    print(&quot;x = `record.x`, y = `record.y`&quot;)

main():
    printXY((x = 1, y = 2))

    # Extra fields are OK:
    printXY((x = 3, y = 4, msg = &quot;hi&quot;))</code></pre>
<p>But I think this feature of records is not that useful. In Fir, records are also value types<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a>, and the main use case for records is returning multiple values. And when returning multiple values that “extra fields” part of the record types is not useful. This is because we can’t return a record with the extension part (<code>..r</code>), unless that record is passed as an argument. Consider:</p>
<pre><code>returnExtensibleRecord() (x: U32, y: U32, ..r):
    ???</code></pre>
<p>There’s no non-divergent expression in the body that will make this type check.</p>
<p>This is different than variants, where a variant construction like <code>~"Hi"</code> will have type <code>[Str, ..r]</code> (with fresh <code>r</code>). So we can have this:</p>
<pre><code>returnVariant() [Str, ..r]:
    ~&quot;Hi&quot;</code></pre>
<p>In other words, rows in variants allow us to assume that a value may have some extra values, and there are many use cases where we want to do that (again, see the blog post linked at the beginning of this section).</p>
<p>Rows in records are for ignoring extra fields, which is not that useful if we assume that the main use case is to return more than one value from functions.</p>
<p>The reason why I implemented row extensions in records is that, once I had the type checker and monomorphiser that can deal with rows, it was straightforward to apply it to records as well.</p>
<p>It also allowed me to experiment with extensible types a bit more, which led to…</p>
<h1 id="a-new-use-case-for-rows">A new use case for rows?</h1>
<p>We can use the variant rows for extending sum types with new constructors, and record rows for extending product types with new fields. Here’s an example that works in Fir today: <a href="#fn4" class="footnote-ref" id="fnref4" role="doc-noteref"><sup>4</sup></a></p>
<pre><code>type Foo[r](
    x: U32,
    y: U32,
    ..r
)

main():
    let foo = Foo(x = 123, y = 456, z = &quot;hi&quot;)
    print(foo.x)
    print(foo.y)
    print(foo.z)</code></pre>
<p><code>Foo</code> is a named type. The <code>r</code> is a record row kinded type parameter, representing extra fields. The type inference infers type <code>Foo[row(z: Str)]</code> for the type of <code>foo</code>. We can access the field <code>z</code> just like any other field.</p>
<p>(The only difference between a record construction syntax and a named type constructor syntax is the missing name: <code>(x = 123, y = 456)</code> is a record, <code>Foo(x = 123, y = 456)</code> is a named type value.)</p>
<p>This gives us a way to extend product types. For example, in our AST, the expression node for binary operators may look like this:</p>
<pre><code>type BinOpExpr[extras](
    left: Expr,
    right: Expr,
    op: BinOp,
    ..extras
)</code></pre>
<p>The formatter could then use this as <code>BinOpExpr[row()]</code>, and the type checker could add an extra field for the inferred type of the expression with <code>BinOpExpr[row(inferredTy: Ty)]</code>.</p>
<p>The idea applies to the sum types the same way, however it’s currently not fully implemented in my prototype, because of syntax issues. Here’s how row extensions look like with sum types:</p>
<pre><code>type Expr[extras]:
    Var(VarExpr)
    BinOp(BinOpExpr)
    ..extras</code></pre>
<p>Now suppose I want to extend this type with the standard library <code>Bool</code> type:</p>
<pre><code>value type Bool:
    False
    True</code></pre>
<p>How should the extra values be constructed? The way we normally construct sum values is as <code>&lt;type&gt;.&lt;constructor&gt;(&lt;args&gt;)</code>, e.g. <code>Bool.True</code>, <code>Expr.BinOp(...)</code>.</p>
<p>But with a sum type extended with another sum type, I’m not sure what syntax to use for construction. I can see two options:</p>
<ul>
<li><code>Expr.Bool.True</code>: extended type, extension type, then constructor.</li>
<li><code>Expr.True</code>: extended type, then constructor.</li>
</ul>
<p>There’s also the issue of not all types having a constructor name. For example, with this syntax, we wouldn’t have a way of constructing a <code>Str</code> as <code>Expr[row[Str]]</code> as string literals are not constructed with the <code>&lt;constructor&gt;(&lt;args&gt;)</code> syntax.</p>
<p>In short, I couldn’t find a nice syntax for sum types with extensions, so they’re currently not implemented in my prototype.</p>
<h1 id="problems-and-features-needed">Problems and features needed</h1>
<p>This approach adds type parameters to types, and type parameters can be contagious. (propagated to the use sites, and their use sites, and theirs…)<a href="#fn5" class="footnote-ref" id="fnref5" role="doc-noteref"><sup>5</sup></a></p>
<p>Consider the statement type in Fir’s AST:</p>
<pre><code>type Stmt:
    Let(LetStmt)
    Assign(AssignStmt)
    Expr(Expr)
    For(ForStmt)
    While(WhileStmt)
    Loop(LoopStmt)
    Break(BreakStmt)
    Continue(ContinueStmt)</code></pre>
<p>To extend this I’ll need one type parameter per extension. If I have to extend <code>let</code> statements and <code>for</code> statements with different fields, I need two:</p>
<pre><code>type Stmt[letExts, forExts]:
    Let(LetStmt[letExts])
    For(ForStmt[forExts])
    ...</code></pre>
<p>It’s clear that this will scale poorly.</p>
<p>To keep the number of type parameter in check we could use something like type families (type-level functions) to have one type per use case (e.g. type checking, formatting), and then map those to different extension types, but I’m not sure if adding type-level functions just to support this feature makes sense.</p>
<p>Another issue is with <code>deriving</code>: we will have some way of deriving trait implementations, similar to Rust<a href="#fn6" class="footnote-ref" id="fnref6" role="doc-noteref"><sup>6</sup></a>. With row extensions, we can’t use a macro with just the item AST as the input, as the macro will just see type parameters for the extensions. We have to iterate the extension fields somehow in the derived code generator, and regardless of how we iterate the row fields, the actual code generation needs to be done during monomorphisation, as that’s when we know the full type arguments.</p>
<p>Finally, to properly type check this we have to extend the constraint language. Consider this:</p>
<pre><code>type Foo[r](
    f1: U32,
    f2: Str,
    ..r
)</code></pre>
<p>Here the constructor <code>Foo</code> will have the type: <code>Fn(f1: U32, f2: Str, ..r) Foo[r]</code><a href="#fn7" class="footnote-ref" id="fnref7" role="doc-noteref"><sup>7</sup></a>, but not all rows will be valid for <code>r</code>: we can’t allow overriding existing fields with different types<a href="#fn8" class="footnote-ref" id="fnref8" role="doc-noteref"><sup>8</sup></a>.</p>
<p>It’s easy to check the example above, but in general, these “lacks” constraints (i.e. “record row type <code>r</code> lacks fields <code>f1</code>, <code>f2</code>”) need to be carried over to the use sites of the type parameter to be able to type check properly. In our <code>type Stmt[letExts, forExts]: ...</code> above, the constraints will be coming from the <code>LetStmt</code> and <code>ForStmt</code> types, not from <code>Stmt</code>, and they need to be carried over to the use sites of <code>Stmt</code>.</p>
<p>Currently not having these constraints on the type parameters doesn’t cause soundness issues as the monomorphiser catches these issues, but it’s not ideal because it means that these errors wouldn’t be caught in the language server (which won’t fully compile, just type check), or when running <code>fir --typecheck &lt;file&gt;</code>. Error reporting is also not as good as error reporting in the type checker.</p>
<p>(The lack of “lacks” constraints is not a problem until this feature because variants can always be extended with any type (duplicate types are OK), and it’s not possible to extend records. At least currently, row types in records are only for forgetting/ignoring extra fields.)</p>
<p>Finally, to avoid repeatedly typing the same row type arguments in the use sites in the parser, formatter, etc. we need type synonyms. Fir currently doesn’t have type synonyms because I don’t think they’re that useful when we have value types, and I hate to deal with them in the type checker.<a href="#fn9" class="footnote-ref" id="fnref9" role="doc-noteref"><sup>9</sup></a> In our <code>Stmt</code> example above, we’ll want to write:</p>
<pre><code># Extensions for type checking.
alias TcLetStmtExts = row(inferredBinderType: Option[Ty])
alias TcForStmtExts = row(inferredIteratorType: Option[Ty])
alias TcLetStmt = LetStmt[TcLetStmtExts]
alias TcForStmt = ForStmt[TcForStmtExts]
alias TcStmt = Stmt[TcLetStmtExts, TcForStmtExts]
...

# Extensions for formatting.
alias FmtLetStmtExts = row()
alias FmtForStmtExts = row()
alias FmtLetStmt = LetStmt[FmtLetStmtExts]
alias FmtForStmt = ForStmt[FmtForStmtExts]
alias FmtStmt = Stmt[FmtLetStmtExts, FmtForStmtExts]
...</code></pre>
<p>And then with a feature similar to type families, we can have one type for each use site (type checker, formatter, …) and map that one type to extension types for each of the rows and reduce number of type parameters. (there will always be at least one type parameter in extended types)</p>
<h1 id="final-thoughts">Final thoughts</h1>
<p>I’m not aware of any other languages that apply row extensions to named types, which is the reason why I wanted to write this post.</p>
<p>The main challenge for this feature to be useful is the <code>deriving</code> support. The macros will have to run during monomorphisation to make use of the extra fields and constructors. The generated code will then be type checked in a different language (monomorphic AST instead of the front-end AST), which can lead to things like: code that normally doesn’t type check, but does type check when generated in a macro, as macro expansion is type checked differently. While I can’t imagine how this could happen today, that doesn’t mean it won’t, and it’s best if we just don’t open the door to this kind of thing.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>See also <a href="https://osa1.net/posts/2020-02-21-knot-tying-why-how-opinions.html">my blog post from 2020</a> that touches some of the same points.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>Variants are value (unboxed) types, so they’re not heap allocated, and refinement just moves fields around. In general, pattern matching should never allocate, and this currently holds in Fir.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3" role="doc-endnote"><p>In short, all anonymous types are values in Fir. For named types the user decides whether to box or not.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn4" role="doc-endnote"><p>This only works in a prototype that currently lives in the <code>extensible_named_types</code> branch. Online interpreter does not have this feature yet.<a href="#fnref4" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn5" role="doc-endnote"><p>I know I failed to articulate it <a href="https://osa1.net/posts/2024-10-09-oop-good.html">at the time</a>, but I think polymorphism without requiring type parameters is the main advantage of subtyping compared to parametric polymorphism, and I think it’s the killer feature of OOP (as I define in the post). I want to get back to this point in a later blog post.<a href="#fnref5" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn6" role="doc-endnote"><p>We already support <code>#[derive(...)]</code>s today, but they’re a part of the self-hosted compiler (not libraries), and I’m not sure if we want to keep them or do it another way. I needed to derive implementations quickly and didn’t have time to consider alternatives too much.<a href="#fnref6" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn7" role="doc-endnote"><p>Yes, I also had to add row extensions to function types for this.<a href="#fnref7" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn8" role="doc-endnote"><p>I think duplicating fields should be OK.<a href="#fnref8" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn9" role="doc-endnote"><p>We don’t want to eagerly expand type synonyms to their RHSs because then error messages refer to the RHSs rather than synonyms, and keeping type synonyms around as we type check means we have to remember to look through them in many places. It’s a minor thing but considering how useful they are (very little, at least until this feature) it just seemed like they’re not worth it.<a href="#fnref9" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>
<entry>
    <title>How Fir formats comments</title>
    <link href="http://osa1.net/posts/2025-09-27-fir-formatter.html" />
    <id>http://osa1.net/posts/2025-09-27-fir-formatter.html</id>
    <published>2025-09-27T00:00:00Z</published>
    <updated>2025-09-27T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p><a href="https://github.com/fir-lang/fir">Fir</a> formats comments by assigning comment tokens to non-comment tokens (only conceptually, not in the implementation, see below), and generating comments when formatting the tokens that “own” them.</p>
<p>This keeps AST nodes small. The parser doesn’t know about comments at all, and code that doesn’t care about comments don’t allocate more or run more code for comments.</p>
<hr />
<p>Formatting source code with comments is tricky, and common suggestions like adding comments to AST nodes or generating lossless (or concrete) syntax trees (CSTs) are not feasible in real programming languages. Consider this simple Fir function:</p>
<pre><code>add(x: U32, y: U32) U32:
    ...</code></pre>
<p>This simple function, without the body, has 14 places where a comment can appear:</p>
<pre><code>#|1|#
#|2|# add #|3|# (
    #|4|# x #|5|# : #|6|# U32 #|7|# ,
    #|8|# y #|9|# : #|10|# U32 #|11|#
) #|12|# U32 #|13|# : #|14|#
    ...</code></pre>
<p>If I were to add comment tokens to AST nodes, about 6 of these would belong to the “function declaration” AST node:</p>
<pre><code>#|1|#
#|2|# add #|3|# ( #|4|# ... ) #|12|# ... : #|14|#
    ...</code></pre>
<p>Because each of these is in different positions in the declaration, they would need different fields in the AST node.</p>
<p>If you consider that a real programming language will have hundreds of different types of expression, statement, declaration, … nodes, it becomes clear that this approach is simply not feasible.</p>
<p>The CST approach is not too different, it just moves the inconvenience from the tree type definitions and tree allocations to the use sites of the trees.</p>
<p>What Fir does is much simpler: it requires no support from the parse trees. The parser doesn’t even know about comments, and the AST users that don’t care about comments also don’t need to deal with them and don’t pay any price for them (runtime or memory).</p>
<p>Conceptually, we assign every comment token to a non-comment token. In the example above, comments 1, 2, and 3 belong to the identifier <code>add</code>. Comment 4 belongs to the token <code>(</code>, and so on.</p>
<p>When formatting, we don’t generate text directly. Instead we format the source code token by token. In the example above, we’re formatting a function definition, so we know that there will be a left paren after the function name. But we don’t generate a “(” directly after the function name. Instead we find the token for the left paren, and format it. This formatting operation also generates comments that belong to the left paren.</p>
<p><strong>Assigning comment tokens to non-comment tokens:</strong> Conceptually, every token owns:</p>
<ul>
<li><p>Comment tokens before them that are not on the same line with another non-comment token.</p></li>
<li><p>Comment tokens after them that are on the same line with the token.</p></li>
</ul>
<p>In the example above, 1 and 2 belong to the identifier <code>add</code> because of the first rule, and 3 also belongs to the identifier because of the second rule.</p>
<p>This only leaves the trailing comments at the end of a file “unowned”, which we handle separately as their own thing.</p>
<p><strong>Finding tokens of AST nodes:</strong> The formatter still operates on AST nodes and AST nodes typically don’t need any extra fields for their tokens.</p>
<p>Instead of adding tokens to AST nodes, we represent identifiers as their tokens. Because many AST nodes have identifiers, we can start with those tokens and scan backwards and forwards to find the other tokens of the AST node, with the comments that they own.</p>
<p>When an AST node doesn’t have any identifiers, or finding the tokens of the node from the identifiers is difficult, we add a field for its first (or last) token, and scan forwards (or backwards) from those tokens to find the other tokens.</p>
<p>For example, in Fir, as of today, type declarations are represented as this: (<a href="https://github.com/fir-lang/fir/blob/7732446fe42185778cf331350345b114087b01b9/Compiler/Ast.fir#L66-L83">source</a>)</p>
<pre><code>## A type declaration: `type Vec[t]: ...`.
type TypeDecl(
    ## When the type is a primitive, the `prim` token.
    prim_: Option[TokenIdx],

    ## The type name. `Vec` in the example.
    name: Id,

    ## Type parameters of the type. `[t]` in the example.
    typeParams: Vec[Id],

    ## Kinds of `type_params`. Filled in by kind inference.
    typeParamKinds: Vec[Kind],

    ## Constructors of the type.
    rhs: Option[TypeDeclRhs],
)</code></pre>
<p>Note that this node doesn’t have a token for the <code>type</code> keyword. Instead we start from <code>name</code> and scan backwards. The first non-trivia token that we see will be the <code>type</code> token. (<a href="https://github.com/fir-lang/fir/blob/7732446fe42185778cf331350345b114087b01b9/Tool/Format/Format.fir#L105-L106">source</a>)</p>
<p>(The <code>prim_</code> field could also be removed and we could scan backwards from the <code>type</code> token. If you’re interested in contributing, we have <a href="https://github.com/fir-lang/fir/issues/206">an issue</a> about cleaning up redundant token fields in AST nodes, which would be a good issue for getting started.)</p>
<p><strong>Generating comments with tokens:</strong> I used the word “conceptually” a few times above, because in the implementation we don’t really assign comment tokens to non-comment tokens.</p>
<p>Instead, the function that formats a token scans backwards and forwards to find comment tokens as described by the rules above, and generates them with the token.</p>
<hr />
<p>Scanning backwards and forwards to find other tokens and collecting comment tokens that belong to a token being formatted are quite simple. Here are the relevant code:</p>
<ul>
<li><p><a href="https://github.com/fir-lang/fir/blob/7732446fe42185778cf331350345b114087b01b9/Tool/Format/Format.fir#L1520-L1583"><code>formatToken</code></a> takes a non-comment token to be formatted and formats the token with the comments that belong to the token.</p></li>
<li><p><code>formatToken</code> calls <a href="https://github.com/fir-lang/fir/blob/7732446fe42185778cf331350345b114087b01b9/Tool/Format/Format.fir#L1703-L1723"><code>findCommentBefore</code></a> to find the first comment before it that needs to be formatted with it.</p>
<p>Finding the comments after it is easier, so it’s done in <code>formatToken</code> directly.</p></li>
<li><p><a href="https://github.com/fir-lang/fir/blob/7732446fe42185778cf331350345b114087b01b9/Tool/Format/Format.fir#L1761-L1773"><code>nextNonTrivia</code></a> and <a href="https://github.com/fir-lang/fir/blob/7732446fe42185778cf331350345b114087b01b9/Tool/Format/Format.fir#L1776-L1788"><code>prevNonTrivia</code></a> scan forwards and backwards from a given token to find the tokens of an AST node, as mentioned in the type declaration example above.</p></li>
<li><p>The trailing comments at the end of the file are not owned by any token, so they’re not formatted by default. Instead they’re <a href="https://github.com/fir-lang/fir/blob/7732446fe42185778cf331350345b114087b01b9/Tool/Format/Format.fir#L71-L81">handled specially</a> by the module formatter.</p></li>
</ul>
<p>Not adding tokens to the AST nodes keeps the AST nodes small (cheaper to allocate), and parser and user code simple. Use sites that don’t care about comment nodes pay no price for larger AST nodes or extra parsing code handling comments.</p>
<p>(There are a few open issues about Fir’s formatter, but none that are caused by the ideas explained in this post.)</p>]]></summary>
</entry>
<entry>
    <title>Fir is getting useful</title>
    <link href="http://osa1.net/posts/2025-09-04-fir-getting-useful.html" />
    <id>http://osa1.net/posts/2025-09-04-fir-getting-useful.html</id>
    <published>2025-09-04T00:00:00Z</published>
    <updated>2025-09-04T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>A few months ago I implemented a <a href="https://github.com/fir-lang/fir/blob/55bf6bacf31d04f5a6b623aedbede5a02bcd31a8/tools/peg/Peg.fir">PEG parser generator</a> in Fir. It <a href="https://github.com/fir-lang/fir/blob/55bf6bacf31d04f5a6b623aedbede5a02bcd31a8/tools/peg/PegGrammar.peg">parses its own grammar</a> and it’s also used to <a href="https://github.com/fir-lang/fir/blob/55bf6bacf31d04f5a6b623aedbede5a02bcd31a8/compiler/Grammar.peg">parse Fir</a>.</p>
<p>This week I finished another sizable<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a> Fir project: a <a href="https://github.com/fir-lang/fir/blob/55bf6bacf31d04f5a6b623aedbede5a02bcd31a8/tools/format/Format.fir">code formatter for Fir</a>. It now <a href="https://github.com/fir-lang/fir/commit/222940029c1cc71da2cd35d4f3c90eab885c918e">formats most of the Fir code</a> in the repo<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a>.</p>
<p>Fir is being designed and implemented from day one with tooling, libraries, and backwards compatibility in mind. The compiler’s front-end is currently being reused by the formatter. Soon it’ll be reused by a syntax-aware search-and-replace tool (similar to <a href="https://github.com/osa1/sg">sg</a>), and by a tool that combines Fir packages into a single .fir file (for sharing repros and automated repro reduction), and much later, by the language server and other tools. You can see the list of tools I want to implement <a href="https://github.com/fir-lang/fir/issues/28">here</a>.</p>
<p>By implementing the tooling along with the first version of the compiler (all in Fir), I want to make sure we have the right SDK design to support all these tools, and more. I want to publish the Fir front-end as a reusable package. This front-end should support the last N<a href="#fn3" class="footnote-ref" id="fnref3" role="doc-noteref"><sup>3</sup></a> releases of Fir, so that you can parse (and analyze, modify, refactor, migrate, …) the last N versions of Fir with the latest version of Fir.</p>
<p>I still haven’t written a post explaining what kind of language I want Fir to be, because that’s still largely an open question. However there are a few things that are decided: a compiled, typed language with ADTs, with typeclasses (called traits) for compile-time polymorphism (monomorphised, with value types), and <a href="https://osa1.net/posts/2025-06-28-why-effects.html">effects</a>. I want Fir to be a high-level, but still efficient, language.</p>
<p>Even implementing just a compiler is a big task, and designing and implementing a whole language with all these tools can’t be done by one person. If this vision sounds interesting to you, and you clicked on a few links above and like what you see, please don’t hesitate to reach out. Each of these tools comes with their own issues and tasks, so it’s now a good time to start contributing to Fir. I already have a list of <a href="https://github.com/fir-lang/fir/issues?q=is%3Aissue%20state%3Aopen%20label%3Apeg">issues for the PEG generator</a> and the <a href="https://github.com/fir-lang/fir/issues?q=is%3Aissue%20state%3Aopen%20label%3Aformatter">formatter</a>. There’s also all kinds of other things in the issue tracker. Depending on your experience, you can also keep yourself entertained in other ways: the interpreter is slow (a simple AST walker), the interpreter’s type checker is not in good shape etc. If you have the experience and opinions, you can also influence the language design.</p>
<p>My next task is, I’ll be implementing the search-and-replace tool mentioned above (I do this now mainly because I need it when working on Fir), and in parallel, <a href="https://github.com/fir-lang/fir/issues/195">designing and implementing the module system</a>. The module system will need to be implemented in the interpreter too, because I’ll be using modules in the compiler and other tools. Depending on how much free time I’ll have, it should be at least a month of work.</p>
<p>I’m happy with how it’s coming along and I’m excited about Fir’s future.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>Formatter is currently 1,086 loc. PEG is 850 loc without the parser for parsing itself. Generated Fir for the parsing PEGs is 2,364 loc, generated from 178 loc PEG.</p>
<p>It’s a bit more difficult to precisely measure the compiler’s grammar size, because it includes semantic actions, but the grammar is 888 loc and generated parser for the grammar is 5,147 loc.</p>
<p>In total (including tests), we have 21,012 loc Fir today in the repo.</p>
<p>All numbers excluding comments and whitespace.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>We don’t format tests to avoid accidentally parsing only formatted code.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn3" role="doc-endnote"><p>I’m not sure what the exact number here should be yet.<a href="#fnref3" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>
<entry>
    <title>Why I'm excited about effect systems</title>
    <link href="http://osa1.net/posts/2025-06-28-why-effects.html" />
    <id>http://osa1.net/posts/2025-06-28-why-effects.html</id>
    <published>2025-06-28T00:00:00Z</published>
    <updated>2025-06-28T00:00:00Z</updated>
    <summary type="html"><![CDATA[<p>Imagine a programming language where you can have full control over whether and how functions, modules, or libraries interact with shared resources like the scheduler for threading, the file system and other OS-level resources like sockets and other file descriptors, timers for things like delaying the current thread for timed updates or scheduling timed callbacks, and so on.</p>
<p>In this language, a function (or module, library, …) needs to declare its interactions with the shared resources in its type.</p>
<p>When a function accesses e.g. the file system, the caller has full control over how it accesses the file system. All file system access functions can be specified (or overridden if they have a default) by the caller.</p>
<p>Furthermore, assume that this language can also suspend functions and resume them later, similar to <code>async</code> functions in many languages today, which are paused and resumed later when the value of e.g. a <code>Future</code> becomes available.</p>
<p>This language lends itself to a more composable system compared to anything that we have today. This system is composable, flexible, and testable by default.</p>
<p>If you think about it, it’s really strange that today we find it acceptable that I can import a library, and the library can spawn threads, use the file system, block the current thread with things like <code>sleep</code> or with blocking IO operations, and I have no control over it.</p>
<p>Most of the time, this kind of thing will be at least documented, but if I use a library that fundamentally needs these things, unless the library accounts for my use case, I may not be able to use it in my application.</p>
<p>For example, maybe it spawns threads but I want it to use my own thread pool where in addition to limiting number of threads, I attach priorities to threads and schedule based on priorities.</p>
<p>Or, maybe I have a library that builds/compiles things by reading files, processing them, and generating files. If I have control over the file system API that the library uses, it takes no effort (e.g. no planning ahead of time) to test this library using an in-memory file system, in parallel, without worrying about races and IO bottlenecks. I don’t have to consider testing scenarios in the library and structure my code accordingly.</p>
<p>Or, maybe I have code that polls some resources, and maybe posts periodic updates. It creates a thread that does the periodic work, and <code>sleep</code>s. With control over threads, schedulers, and timers, I can fast-forward in time (to the next event) in my tests without actually waiting for <code>sleep</code>s and any other timed events, to test my code quickly.</p>
<p>These are some of the things I get to do with an effect system.</p>
<h2 id="whats-in-an-effect-system">What’s in an effect system?</h2>
<p>At a high-level, an effect system has two components: (1) a type system, and (2) runtime features.</p>
<p>These two components are somewhat orthogonal: you can have one without the other, depending on what you want to make possible.</p>
<p>In the systems available today, (1) typically involves adding a type component to function types, for the effects a function can invoke.<a href="#fn1" class="footnote-ref" id="fnref1" role="doc-noteref"><sup>1</sup></a></p>
<p>For example, in <a href="https://koka-lang.github.io/">Koka</a>, if you define stdin/stdout operations in an effect named <code>console</code>, and have a function that uses the <code>console</code> effects, the function’s type signature looks like this:</p>
<pre><code>fun sayHi() -&gt; console ()
  print(&quot;hi&quot;)</code></pre>
<p>This type says <code>sayHi</code> returns unit (<code>()</code>) and uses the <code>console</code> effect.</p>
<p>(2) typically involves capturing the continuation of the effect invocation and passing it to a “handler”. Depending on the system, the handler can then do things (e.g. memory operations, invoking other effects) and “jump” to (or “tail call”) the continuation with the value returned by the invoked effect.</p>
<p>With the <code>console</code> effect above, a handler may just record the printed string in a data structure, which can then be used for testing. Another handler may actually write to <code>stdout</code>, which would then be used when you run the application.</p>
<p>Depending on the exact (1) and (2) features, you get to do different things. The current effect systems in various languages support different (1) and (2) features, and there are some systems that omit one of (1) or (2) entirely.</p>
<p>For the purposes of this blog post, we won’t consider the full spectrum of features you can have, and what those features allow.</p>
<h2 id="example-a-simple-grep-implementation-in-koka">Example: a simple grep implementation in Koka</h2>
<p>There isn’t a language today that gives us everything we need for the use cases I describe at the beginning.</p>
<p>However among the languages that we have, Koka comes close, so we’ll use Koka for a simple example.</p>
<p>Imagine a simple “grep” command that takes a string and a list of file paths as arguments, and finds occurrences of the string in the file contents and reports them.</p>
<p>In Koka, the standard library definitions for these “effects” could look like this:</p>
<pre><code>effect fs
  ctl read-file(path: path): string

effect console
  ctl println(s: string): ()</code></pre>
<p>Using these effects, the code that reads the files and searches for the string is not different from how it would look like in any other “functional”<a href="#fn2" class="footnote-ref" id="fnref2" role="doc-noteref"><sup>2</sup></a> language:</p>
<pre><code>fun search(pattern: string, files: list&lt;string&gt;): &lt;fs, console&gt;()
  val pattern-size = pattern.count()
  files.foreach fn(file)
    val contents = read-file(file.path)
    val parts = contents.split(pattern)
    report-matches(file, pattern-size, parts)

fun report-matches(file: string, pattern-size: int, parts: list&lt;string&gt;): &lt;console&gt;()
  if parts.length == 0 then
    return ()

  println(file)

  var line := 0
  var column := 0
  parts.init.foreach fn(part)
    part.vector.foreach fn(char)
      if char == &#39;\n&#39; then
        line := line + 1
        column := 0
      else
        column := column + 1

    println((line + 1).show ++ &quot;:&quot; ++ (column + 1).show)</code></pre>
<p>When calling <code>search</code>, I have to provide handlers for <code>fs</code> and <code>console</code> effects.</p>
<p>In the executable that I generate for users, I can use handlers that do actual file system operations and print to <code>stdout</code>:</p>
<pre><code>val fs-io = handler
  ctl read-file(path: path)
    resume(read-text-file(path))

val console-terminal = handler
  ctl println(s: string)
    write-to-stdout(s)
    resume(())</code></pre>
<p>In the tests, I can use a <code>read-file</code> handler that reads from an in-memory map, and add printed lines to a list, to compare with the expected test outputs:</p>
<pre><code>struct test-case
  files: list&lt;test-file&gt;
  pattern: string
  expected-output: list&lt;string&gt;

struct test-file
  path: path
  contents: string

val test-cases: list&lt;test-case&gt; = [
  Test-case(
    files = [Test-file(&quot;file1&quot;.path, &quot;test\ntest&quot;), Test-file(&quot;file2&quot;.path, &quot;a\n test\nb&quot;)],
    pattern = &quot;test&quot;,
    expected-output = [&quot;file1&quot;, &quot;1:1&quot;, &quot;2:1&quot;, &quot;file2&quot;, &quot;2:2&quot;]
  ),
]

fun test(): &lt;exn&gt;()
  var printed-lines := Nil

  test-cases.foreach fn (test)
    with handler
      ctl read-file(path_: path)
        match test.files.find(fn (file) file.path.string == path_.string)
          Just(file) -&gt; resume(file.contents)
          Nothing -&gt; throw(&quot;file not found&quot;, ExnAssert)

    with handler
      ctl println(s: string)
        printed-lines := Cons(s, printed-lines)
        resume(())

    search(test.pattern, test.files.map(fn (file) file.path.string))

    if printed-lines.reverse != test.expected-output then
      throw(&quot;unexpected test output&quot;, ExnAssert)</code></pre>
<p>You can see the full example <a href="https://gist.github.com/osa1/a5e7fdfa30d69125970c0797c525ede2">here</a>.</p>
<h2 id="i-can-already-do-this-in-language-x-using-libraryframework-y">I can already do this in language X using library/framework Y?</h2>
<p>The point with effect systems is that, you don’t get a composable and testable system <em>when you design for it</em>, you get it <em>by default</em>.</p>
<p>If you implement a library that uses the file system, I can run it with an in-memory file system, or intercept file accesses to prevent certain things, or log certain things, and so on, regardless of whether you designed for it or not.</p>
<p>The Koka code above does not demonstrate this fully, and there’s no system available today that can. I’m just using whatever is available today.</p>
<p>In an ideal system, you would have to go out of your way to have access to the filesystem without using an effect, rather than the other way around.</p>
<p>When comparing languages we never talk about what’s possible: almost everything is possible in almost every general purpose programming language.</p>
<p>What we’re talking about is things like: the idiomatic and performant way of doing things.</p>
<p>The language where what I talk about is idiomatic and performant does not exist today.</p>
<h2 id="how-do-we-know-that-this-ideal-system-is-possible">How do we know that this ideal system is possible?</h2>
<p>We mentioned that the two components of an effect system are somewhat orthogonal. In the design that I have in mind (more on this below), without the type system part of it you still get 90% of the benefits. So let’s focus on the runtime parts.</p>
<p>What you need for a flexible effect system is, <em>conceptually</em>, a way of suspending the stack when calling an effect, passing the suspended stack (you may want to call it a “continuation”) to the handler for the effect invoked.</p>
<p>This kind of thing is already possible in many of the high-level languages today. If your language supports lightweight threads (green threads, fibers, etc.), coroutines, generators, or similar features where the code is suspended when it does something like <code>await</code> or <code>yield</code>, and then resumed later, you already have the runtime features for a flexible effect system.</p>
<h2 id="for-me-its-about-composable-and-testable-libraries">For me, it’s about composable and testable libraries</h2>
<p>I deliberately didn’t mention in this blog post so far that effect systems generalize features like async/await, iterators/generators, exceptions, and many other features.</p>
<p>The reason is because, as a user, I don’t care whether these features are implemented using an effect system under the hood, or in some other ways. For example, Dart has all of these features, but it doesn’t use an effect system to implement them. As a user, it doesn’t matter to me as long as I have the features.</p>
<p>Instead, what I’m more interested in as a user is: how it influences or affects library design, and what it allows me to do at a high level, in large code bases.</p>
<p>However it would be a shame to not mention that, yes, effect systems generalize all these features, and more. The paper <a href="https://www.microsoft.com/en-us/research/wp-content/uploads/2017/05/asynceffects-msr-tr-2017-21.pdf">“Structured Asynchrony with Algebraic Effects”</a> shows how these features can be implemented in Koka.</p>
<h2 id="to-be-continued">To be continued</h2>
<p>Some of the recent discussions online about effect systems left me somewhat dissatisfied, because most posts seem to focus on small-scale benefits of effect systems, and I wanted to share my incomplete (but hopefully not incoherent!) perspective on effect systems.</p>
<p>In the future posts I’m hoping to cover some of the open problems when designing such a system.</p>
<hr />
<p>Thanks to <a href="https://github.com/TimWhiting/">Tim Whiting</a> for reviewing a draft of this blog post.</p>
<section class="footnotes" role="doc-endnotes">
<hr />
<ol>
<li id="fn1" role="doc-endnote"><p>This is a somewhat rough estimate on what these effect types in function types indicate. In practice it’s more complicated than “effects the function invokes”: if you read it as that you fail to explain some of the type errors, or why some code of the code type checks. More on this (hopefully) in a future post.<a href="#fnref1" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
<li id="fn2" role="doc-endnote"><p>“Functional” in quotes because I don’t think that word means much these days. Maybe more on this later.<a href="#fnref2" class="footnote-back" role="doc-backlink">↩︎</a></p></li>
</ol>
</section>]]></summary>
</entry>

</feed>
