osa1 github about atom

Fir now compiles to C (+ extensible named types, associated types, modules, and more)

April 15, 2026 - Tagged as: en, fir.

One of my original goals with Fir was to bootstrap it as early as possible. I was so determined, I committed the first code for the self-hosted compiler in the 322nd commit, on 11 April 2025, after less than a year of development in the open source1, when it was barely usable. To understand how early this is, we’re currently on commit 1,052, and in my opinion it only recently became somewhat usable.

Unsurprisingly, this turned out to be a challenge, and I had to accept the fact that a compiler for a non-trivial language is a lot of work. You really need a lot of language features + a good implementation (generating fast code) for it. It’s not that it cannot be done otherwise, but the process becomes extremely slow, tedious, and boring.

Something else that became evident as I worked on the self-hosted compiler was that, even if I finish it with just the features we have, I’ll have to refactor it quite significantly as we implement the planned features, to the point where it could feel more like a rewrite than a refactoring.

Finally, I thought (perhaps mistakenly), with some of the recent developments in programming tooling and software development methods (you know what I’m talking about), with a working reference implementation + tests, bootstrapping effort could be largely automated.

So I started implementing features that I consider essential for Fir 1.0, in the reference implementation. In this post we’ll look at some of these features that were recently implemented.

Fir now compiles to C

The Fir reference implementation now compiles to C. The motivation for this development was that running the self-hosted compiler with the interpreter to compile itself was taking 8.8s, despite just parsing + name resolving. That’s already too long, and it was going to get much worse as we implement type checking, monomorphisation, and code generation.

I made a few attempts at optimizing the interpreter, but it became clear that with very little effort, compared to designing and implementing a bytecode interpreter, I could compile it to C2. Because we already had a monomorphiser, compilation to C was mostly very straightforward, and we immediately got 12x speedup: the self-hosted compiler started checking itself in 0.7s instead of 8.8s.

When working on the compiler, compiling the compiler to C, then compiling that C to native with clang, then running the executable on the compiler itself is currently at 1.7s.

This also improved the workflow in other areas: formatting the whole code base and compiling PEG files take an instant now, instead of many seconds.

This also allowed other things that made this even better in terms of return-on-investment: we got free garbage collection with the Boehm-Demers-Weiser conservative GC, and value types became trivial to implement. This will also make it easier to add C FFI in the future. (more on this below)

The interpreter still exists, mainly to keep the online interpreter running.

Value types

Fir literally started as a “high-level language with value types”, but it wasn’t entirely trivial to implement them until we had the C backend.

With the C backend, it became a matter of making it generate typed code (instead of treating all values as uint64_t* or similar), and then not mallocing value types.

Here’s an example value type, from the standard library:

# Immutable, UTF-8 encoded strings.
value type Str(
    # UTF-8 encoding of the string.
    _bytes: Array[U8],
)

Relevant struct definitions in generated C:

typedef struct Array_U8 {
    U8* data_ptr;
    uint64_t len;
} Array_U8;

typedef struct Str {
    Array_U8 _0;
} Str;

This type is then used directly (instead of as a pointer). Here’s a forward-declaration of a function from the self-hosted compiler:

// Compiler/ParseUtils.fir:32:1 parseCharLit[U32]
static Char _fun_8(Str _p0);

(The comment line here is generated by the compiler to make it easier to read the generated code.)

Associated types

This was a feature that I delayed implementing for way too longer than I should’ve, mostly because I didn’t know how to implement them and it took a while to figure it out.

Associated types in Fir are the same feature as associated types in Rust. The most common use case for them is the Iterator trait. Before associated types, Iterator in Fir looked like this: (omitting extra methods with default implementations)

trait Iterator[iter, item, exn]:
    next(self: iter) Option[item] / exn

Here’s how the CharIter’s (iterates characters of a string) Iterator implementation looked like:

impl Iterator[CharIter, Char, exn]:
    next(self: CharIter) Option[Char] / exn:

This trait definition has a problem. The type of Iterator.next is this:

[Iterator[iter, item, exn]] Fn(self: iter) Option[item] / exn

Based on this type, in a call site like charIter.next() (where charIter : CharIter), we generate the predicate Iterator[CharIter, item, exn] and the type of the call expression becomes Option[item]. (where item and exn are fresh unification variables)

If the expected type of the call expressions is not precise enough to unify that item type with a concrete type, the predicate never becomes Iterator[CharIter, Char, exn], and we can’t solve it, because there isn’t an impl for Iterator[CharIter, item, exn] (note: with generic item). We only have Iterator[CharIter, Char, exn].

This resulted in lots of type annotations in the code that uses the Iterator trait. Most importantly, it required type annotations in for loops as for loops used Iterator under the hood. For example:

for char: Char in charIter:
    print(char)

Here print is a generic function that works on any ToStr type, so without the type annotation the predicate became too generic and couldn’t be solved.

With associated types, the trait now looks like this:

trait Iterator[iter, exn]:
    type Item
    next(self: iter) Option[Item] / exn

impl Iterator[CharIter, exn]:
    type Item = Char
    next(self: CharIter) Option[Char] / exn:

With this definition, the predicate for the same call becomes Iterator[CharIter, exn] (where exn is a fresh unification variable), and that’s immediately resolved using this impl. The for loop example above now works without a type annotation.

Associated types also allowed the next feature.

It’s now possible to implement traits for record types

This was a small development in terms of code, but an important one for the language. Until this feature, we could pass records around and access fields in polymorphic contexts, but if we want to take a polymorphic record (with a row extension) and e.g. print it, there was no way.

This wasn’t too important until recently, as the main use case for records was returning multiple values. You’d then destruct/pattern match on the return values directly and use them individually. For example:

divRem(x: U32, y: U32) (div: U32, rem: U32): ...

# Users just match on the fields instead of passing the return value around
# as a record.
let (div, rem) = divRem(a, b)

However with the other developments listed below, records became much more useful, and not being able to implement traits on them became a problem.

The solution was porting PureScript’s RowToList typeclass to Fir. The idea is that we define a “magic” trait that converts record rows into heterogeneous lists:

trait RecRowToList[recRow]:
    type List
    rowToList(rec: (..recRow)) Option[List]

Here recRow is a record-row-kinded type parameter. This trait is resolved by the compiler for any valid (with right kind) type argument, and depending on the type argument the List type is also generated as an heterogeneous list. The heterogeneous list type is defined as this, in the standard library:

value type List[head, tail](
    head: head,
    tail: Option[tail],
)

In the generated List types for record rows, the head type is always a RecordField:

value type RecordField[t](
    label: Str,
    value_: t,
)

So for example, RecRowToList[row(x: U32, msg: Str)] is resolved by the type checker, and the List type is also resolved as List[RecordField[Str], List[RecordField[U32], []]]3 4.

Here’s how to implement ToStr on records using this machinery:

impl[ToStr[RecRowToList[r].List]] ToStr[(..r)]:
    toStr(self: (..r)) Str:
        match RecRowToList[r].rowToList(self):
            Option.None: "()"
            Option.Some(list): "(`list`)"


impl[ToStr[t]] ToStr[RecordField[t]]:
    toStr(self: RecordField[t]) Str:
        "`self.label` = `self.value_`"


impl[ToStr[head], ToStr[tail]] ToStr[List[head, tail]]:
    toStr(self: List[head, tail]) Str:
        match self.tail:
            Option.None: "`self.head`"
            Option.Some(t): "`self.head`, `t`"

impl ToStr[[]]:
    toStr(self: []) Str:
        panic("unreachable")

Note that the List and RecordField types are value types, so rowToList does not allocate. It just generates a different representation of the record on stack that we can recurse on.

Matching a bunch of fields at once, as a record

This was one of the very simple features that made records so much more useful.

When pattern matching fields, we can now use ..var syntax to assign unmatched fields to a variable, as a record. Here’s a simple example:

type Test(
    x: U32,
    y: U32,
    z: U32,
    msg: Str
)


main():
    let x = Test(x = 1, y = 2, z = 3, msg = "hi")
    let Test(y, ..rest) = x
    print(rest)

In the pattern, y matches the field y, rest matches the rest of the fields, as (x: U32, z: U32, msg: Str). Then, using the ToStr implementation of records as shows above, this prints (msg = "hi", x = 1, z = 3).

This is not the main use case for this feature, but just as a note, when combined with traits on records, this allows easily implementing traits by reusing records’ implementations of the traits. For example, ToStr for Test here can be implemented as:

impl ToStr[Test]:
    toStr(self: Test) Str:
        let Test(..fields) = self
        "Test`fields`"

With this implementation, the value x above now prints as Test(msg = "hi", x = 1, y = 2, z = 3). This is the same output as the derived ToStr for this type, just with the different field order. (derived impl would print fields in the source code order, so: x, y, z, msg)

Splicing records and named arguments

We can now pass records as named arguments. The feature above copies field values to records, this one copies records to named arguments for fields.

This is also straightforward and I think a simple example should suffice, using the same types as above:

main():
    let x = Test(x = 1, y = 2, z = 3, msg = "hi")
    print(x)

    let Test(y, ..rest) = x     # rest: (x: U32, z: U32, msg: Str)
    let y = Test(y = 0, ..rest)
    print(y)

# output:
# Test(msg = hi, x = 1, y = 2, z = 3)
# Test(msg = hi, x = 1, y = 0, z = 3)

Reminder: records (and variants) are value types. They’re not heap allocated. So the code above does not allocate for the rest record.

We can also make larger records from smaller ones with this feature:

main():
    let x = (x = u32(123), y = u32(456))
    let y = (msg = "hi", ..x)
    print(y)

# output: (msg = "hi", x = 123, y = 456)

Splicing two records together is currently not possible: there can be at most one ..expr in a record expression.

Extensible named types

This is a big one that I talked about in a previous post. It only became usable after the record features above, associated types, and type synonyms.

For a running example, I added a full program to the online interpreter, showing a solution to the extensible AST types problem described in the blog post. It’s extensively documented, explaining all the interesting bits, so I recommend just checking it out.

In short, we allow extending named types using record rows. Pattern matching, allocation, and everything else works the same way as records. Here’s an example:

type Foo[r](
    x: U32,
    y: U32,
    ..r
)


impl[r: Row[Rec], ToStr[RecRowToList[r].List]] ToStr[Foo[r]]:
    toStr(self: Foo[r]) Str:
        let Foo(..fields) = self
        "Foo`fields`"


main():
    let x = Foo(x = 1, y = 2, msg = "hi")
    let y = Foo(b = Bool.True, y = 10, blah = Option.Some(u32(0)), x = 11)
    print(x)
    print(y)


# output:
# Foo(msg = hi, x = 1, y = 2)
# Foo(b = Bool.True, blah = Option.Some(0), x = 11, y = 10)

Foo here is an extensible type. In the allocation sites, we allocate it with different extra fields. The inferred types here are:

ToStr implementation is implemented using the record field matching features explained above, but it can also be derived.

This feature is used in the self-hosted compiler and the tools. The code is a bit long, but we basically use the same idea demonstrated in the online demo linked above, to add different fields to the AST nodes used by different tools. For example, here’s the AST node type for variant expressions, when compiled to C, as a part of the self-hosted compiler:

typedef struct VariantExpr_CompilerAstExts {
    Expr_CompilerAstExts* _0;
    Option_Ty _1;
} VariantExpr_CompilerAstExts;

And here’s the exact same type, but in the formatter’s compiled C code:

typedef struct VariantExpr_DefaultAstExts {
    Expr_DefaultAstExts* _0;
} VariantExpr_DefaultAstExts;

This is smaller because the formatter doesn’t have the extra field the compiler adds to the type.

Both are generated from this Fir type:

type VariantExpr[exts](
    expr: Expr[exts],
    ..AstExts[exts].InferredTyExts
)

You can see the full generic AST definitions used by the compiler and other tools here.

Because we can implement traits on records and record rows now, deriving traits also work on extensible types. In the example above, I can just add #[derive(ToDoc)] to Foo and then print it like this:

#[derive(ToDoc)]
type Foo[r](
    x: U32,
    y: U32,
    ..r
)


main():
    let x = Foo(x = 1, y = 2, msg = "hi")
    let y = Foo(b = Bool.True, y = 10, blah = Option.Some(u32(0)), x = 11)
    print(x.toDoc().render(80))
    print(y.toDoc().render(80))


# output:
# Foo(x = 1, y = 2, (msg = "hi"))
# Foo(x = 11, y = 10, (b = Bool.True, blah = Option.Some(0)))

The AST types in the compiler all derive traits this way.

Modules

Until recently, importing a module in Fir just parsed the module and copied the parsed code to the current module.

In other words, there was just one module. There were no name spaces, private definitions, selective imports, or importing with renaming.

It took quite a while to design and implement a proper module system and I actually found it quite difficult to design this, even though in the end the design was quite simple. There were two problems that made this difficult for me:

First, I wasn’t sure whether we want just namespacing (plus the usual features for selective imports, renaming, etc.) or something fancier, like first-class modules.

To figure this out I studied OCaml’s module system (and also blogged about it) and 1ML in a bit more detail, and decided that I want the modules to be type checking units (to be checked in parallel) and namespaces, instead of first-class values.

This significantly simplified the design, but the design space was still huge and there were just two constraints:

So the second problem was that these requirements did not constrain the design space enough to give me a small number of options, with obvious and significant tradeoffs between them. I could probably come up with a dozen designs that would all be good enough.

In the end I had to make somewhat arbitrary decisions, based on what I needed in the past, from the other module systems that I used, and what I didn’t, and preference and taste. I updated one thing as I implemented it, and settled on this:

Here’s how relevant syntax looks currently:

# Each module can have at most one `import`. Documentation comments added to
# `import` lines become documentation comment of the module. When a module
# doesn't import anything an empty `import []` can be added to document the
# module.

## This is the module documentation.

import [
    # Import everything from `Fir/Prelude`, to use directly (without module
    # prefix).
    # This is implicitly added to every module already, so not needed. On here
    # for demonstration purposes.
    Fir/Prelude,

    # This allows using symbols imported from the module with the given prefix.
    # E.g. instead of `Option.Some(123)` we do `P/Option.Some(123)`.
    Fir/Prelude as P,

    # Only imports listed things.
    Fir/Prelude/[Option, Result, min, max],

    # Only imports listed things, but with renaming.
    Fir/Prelude/[Option, Result, min as _min, max as _max],
]


main():
    # Some random combination of imported things, used in different ways.
    print(Option.Some(_min(P/max(P/u32(0), u32(1)), u32(2))))


# output: Option.Some(1)

Some other notes and clarifications on this design:

So far I’m happy with how it looks (syntax) and how it works, but as with most things in this language, it’s open to improvements, refinements, and even backwards incompatible changes.

Smaller features: kind annotations and type synonyms

These don’t need much introduction, but I want to document why they were needed and implemented.

Type synonyms came in handy in two places:

The second one is obviously a type synonym, but the first one also uses the same underlying code. We just make type synonyms scoped, and create new synonyms in trait and impl bodies, for the associated types.

Kind annotations became necessary as we started using row-kinded type parameters more, for the extensible named types. Currently kind inference is very simple, it only looks at the current definition. If a type parameter is used in a row extension position (i.e. ..var), its kind is inferred as Row[Rec] or Row[Var] depending on whether the extension is in a record (or fields) or variant (or constructors).

That means that in the extensible named type example above:

type Foo[r](
    x: U32,
    y: U32,
    ..r
)

Here r’s kind is inferred as Row[Rec]. But if we had another type that passed a generic r to it:

type Bar[r](foo: Foo[r])

This r’s kind was inferred as *, which is incorrect.

I don’t want to introduce module-level kind inference for various reasons, so I had to add kind annotations here. The correct definition with kind annotations is:

type Bar[r: Row[Rec]](foo: Foo[r])

Kinds follow the same syntax as types. *-kinded type parameters are just listed, without any annotations. This is useful to avoid reordering type parameters just to specify kinds of some of the types. E.g. if I have

foo(x: t, y: Bar[r]): ...

Here the inferred type parameters are [t: *, r: *], generated from the signature by left-to-right scan. When calling we can explicitly pass them as foo[type1, type2](...).

This passes wrong kinded type to Bar. To fix, we have to specify the kind of r:

foo[r: Row[Rec]](x: t, y: Bar[r]): ...

But this also reorders type parameters as [r: Row[Rec], t: *] now, as the type parameter lists are generated by a left-to-right scan of the signature.

To fix, we have to also list the type parameter t explicitly, just without a kind:

foo[t, r: Row[Rec]](x: t, y: Bar[r]): ...

This gives us the original order of the type parameters, but with the correct kinds: [t: *, r: Row[Rec]].

Next up: C header imports (C FFI)

This post is already too long so I want to keep this part short for now. With the (1) resources that I have (2) things I want to do with this language (3) what we have currently (current implementation), the shortest path to success (some kind of adoption) that I can see is by making C interop absolutely effortless.

By “effortless” I really mean it: I should be able to import a C header file in directly in Fir and just use the definitions and link the generated C with object files implementing the prototypes, and provide implementations for symbols used by other compiled C code.

Similar to the module system, this is an area I don’t have a lot of experience about. Depending on things that are our out of my control (i.e. life, responsibilities), and whether I’ll encounter fundamental issues, I suspect this will take 6-12 months to fully implement. Once done, Fir will be useful for many use cases.


  1. I started working on it earlier in 2024. Open sourced in June 2024.↩︎

  2. This was partly thanks to the GCC extension statement expressions, which allowed me to compile nested expressions directly to C without having to flatten them in an A-normal form IR or similar. The extension is also supported by clang so it didn’t make the generated C less portable.↩︎

  3. [] is the empty variant type, which doesn’t have any values.↩︎

  4. The generated list fields are sorted on field names, so msg comes before x here.↩︎