March 7, 2026 - Tagged as: en, fir.
The front-end AST types are one of the most important types in a language implementation, and if we get them wrong nothing will be right in the rest of the implementation.
These types should be cheap to allocate and efficient to use, but also extensible, as different tools will use them differently. A type checker may want to add inferred types to expressions, but for a formatter, those inferred type fields would be a waste of memory.
One approach to this problem is to have a parser that generates parse events instead of an AST or CST, and let the tools have their own ASTs. I explored this in a previous blog post.
This approach works fine when the language is small, but for a programming language that’s never the case. Fir is currently quite simple, yet it has 28 types of expressions. Most production languages have many more.
So I’ve been thinking about making Fir’s AST types extensible with new fields in the self-hosted compiler. This AST will be used by many of the tools listed here, and more. The parser and the AST types will be published as libraries.
There are a few common ways to add new fields to an existing type:
With subtyping of nominal types (common in OOP languages), we can create a subtype with extra fields.
In languages where objects have identities (again, common in OOP languages), we can use an identity map to map objects to extra information.
If the objects don’t have identities, we can manually generate unique identities for objects that we want to attach extra information to, and then use a map, like in the previous option.
(3) can be done in Fir, and it has a few advantages compared to extending existing types: 1
The maps we use to attach extra information to AST nodes can be deallocated separately from the AST types. So if we have a long computation where we need some information in some of the steps but not later, we can allocate the maps and the deallocate while keeping the AST nodes alive.
We can create differently typed identities for different AST types, and generate the identities as consecutive numbers. Then use arrays instead of hash maps to map nodes to things.
Unlike built-in identities, we can choose the identity size (e.g. 32-bit numbers instead of 64-bit), and embed information about the values in the identities.
I think this is probably the way to go in Fir’s self-hosted compiler, at least in the short term.
However while thinking about this I also found another way to extend types with more information, with row types.
Row types are mainly used for variants, which are the types that make exception handling in Fir safe, expressive, and convenient to use.
A variant is just a set of types, e.g. [U32, Str] is a variant type with U32 (32-bit unsigned integer) and Str (immutable, UTF-8 encoded unicode strings). Values of this type can be U32s or Strs.
The type [U32, Str, ..r] is the same as before, but it can have more types in it. When pattern matching a value of this type, we have to have a catch-all case handling the ..r part, which represents extra types that the value may have.
To construct a variant value we just add a ~ prefix, e.g. ~123 gets the type [U32, ..r] (with a fresh r).
A crucial feature of variants in Fir is that they allow type refinement when pattern matching. If I have a variant value with type [Bool, Str, ..r], and handle the Bools in a pattern match and bind the rest to a variable, the variable gets a refined type:
handleBools(arg: [Bool, Str, ..r]) [Str, ..r]:
match arg:
~Bool.True: ~"True"
~Bool.False: ~"False"
other: other
Here the type of other is refined as [Str, ..r], because the previous alternative of the match handles the Bool values, so at this point we know that the value can’t be a Bool. 2
When variants are used as checked exceptions, this allows things like: catching some of the exceptions thrown by a function and propagating the rest. See the link at the beginning of this section for more examples.
Now, these row types that represent “extra stuff” can also be used in records, and Fir supports that too. For example, the function below can take any record that has at least x: U32 and y: U32 fields:
printXY(record: (x: U32, y: U32, ..r)):
print("x = `record.x`, y = `record.y`")
main():
printXY((x = 1, y = 2))
# Extra fields are OK:
printXY((x = 3, y = 4, msg = "hi"))
But I think this feature of records is not that useful. In Fir, records are also value types3, and the main use case for records is returning multiple values. And when returning multiple values that “extra fields” part of the record types is not useful. This is because we can’t return a record with the extension part (..r), unless that record is passed as an argument. Consider:
returnExtensibleRecord() (x: U32, y: U32, ..r):
???
There’s no non-divergent expression in the body that will make this type check.
This is different than variants, where a variant construction like ~"Hi" will have type [Str, ..r] (with fresh r). So we can have this:
returnVariant() [Str, ..r]:
~"Hi"
In other words, rows in variants allow us to assume that a value may have some extra values, and there are many use cases where we want to do that (again, see the blog post linked at the beginning of this section).
Rows in records are for ignoring extra fields, which is not that useful if we assume that the main use case is to return more than one value from functions.
The reason why I implemented row extensions in records is that, once I had the type checker and monomorphiser that can deal with rows, it was straightforward to apply it to records as well.
It also allowed me to experiment with extensible types a bit more, which led to…
We can use the variant rows for extending sum types with new constructors, and record rows for extending product types with new fields. Here’s an example that works in Fir today: 4
type Foo[r](
x: U32,
y: U32,
..r
)
main():
let foo = Foo(x = 123, y = 456, z = "hi")
print(foo.x)
print(foo.y)
print(foo.z)
Foo is a named type. The r is a record row kinded type parameter, representing extra fields. The type inference infers type Foo[row(z: Str)] for the type of foo. We can access the field z just like any other field.
(The only difference between a record construction syntax and a named type constructor syntax is the missing name: (x = 123, y = 456) is a record, Foo(x = 123, y = 456) is a named type value.)
This gives us a way to extend product types. For example, in our AST, the expression node for binary operators may look like this:
type BinOpExpr[extras](
left: Expr,
right: Expr,
op: BinOp,
..extras
)
The formatter could then use this as BinOpExpr[row()], and the type checker could add an extra field for the inferred type of the expression with BinOpExpr[row(inferredTy: Ty)].
The idea applies to the sum types the same way, however it’s currently not fully implemented in my prototype, because of syntax issues. Here’s how row extensions look like with sum types:
type Expr[extras]:
Var(VarExpr)
BinOp(BinOpExpr)
..extras
Now suppose I want to extend this type with the standard library Bool type:
value type Bool:
False
True
How should the extra values be constructed? The way we normally construct sum values is as <type>.<constructor>(<args>), e.g. Bool.True, Expr.BinOp(...).
But with a sum type extended with another sum type, I’m not sure what syntax to use for construction. I can see two options:
Expr.Bool.True: extended type, extension type, then constructor.Expr.True: extended type, then constructor.There’s also the issue of not all types having a constructor name. For example, with this syntax, we wouldn’t have a way of constructing a Str as Expr[row[Str]] as string literals are not constructed with the <constructor>(<args>) syntax.
In short, I couldn’t find a nice syntax for sum types with extensions, so they’re currently not implemented in my prototype.
This approach adds type parameters to types, and type parameters can be contagious. (propagated to the use sites, and their use sites, and theirs…)5
Consider the statement type in Fir’s AST:
type Stmt:
Let(LetStmt)
Assign(AssignStmt)
Expr(Expr)
For(ForStmt)
While(WhileStmt)
Loop(LoopStmt)
Break(BreakStmt)
Continue(ContinueStmt)
To extend this I’ll need one type parameter per extension. If I have to extend let statements and for statements with different fields, I need two:
type Stmt[letExts, forExts]:
Let(LetStmt[letExts])
For(ForStmt[forExts])
...
It’s clear that this will scale poorly.
To keep the number of type parameter in check we could use something like type families (type-level functions) to have one type per use case (e.g. type checking, formatting), and then map those to different extension types, but I’m not sure if adding type-level functions just to support this feature makes sense.
Another issue is with deriving: we will have some way of deriving trait implementations, similar to Rust6. With row extensions, we can’t use a macro with just the item AST as the input, as the macro will just see type parameters for the extensions. We have to iterate the extension fields somehow in the derived code generator, and regardless of how we iterate the row fields, the actual code generation needs to be done during monomorphisation, as that’s when we know the full type arguments.
Finally, to properly type check this we have to extend the constraint language. Consider this:
type Foo[r](
f1: U32,
f2: Str,
..r
)
Here the constructor Foo will have the type: Fn(f1: U32, f2: Str, ..r) Foo[r]7, but not all rows will be valid for r: we can’t allow overriding existing fields with different types8.
It’s easy to check the example above, but in general, these “lacks” constraints (i.e. “record row type r lacks fields f1, f2”) need to be carried over to the use sites of the type parameter to be able to type check properly. In our type Stmt[letExts, forExts]: ... above, the constraints will be coming from the LetStmt and ForStmt types, not from Stmt, and they need to be carried over to the use sites of Stmt.
Currently not having these constraints on the type parameters doesn’t cause soundness issues as the monomorphiser catches these issues, but it’s not ideal because it means that these errors wouldn’t be caught in the language server (which won’t fully compile, just type check), or when running fir --typecheck <file>. Error reporting is also not as good as error reporting in the type checker.
(The lack of “lacks” constraints is not a problem until this feature because variants can always be extended with any type (duplicate types are OK), and it’s not possible to extend records. At least currently, row types in records are only for forgetting/ignoring extra fields.)
Finally, to avoid repeatedly typing the same row type arguments in the use sites in the parser, formatter, etc. we need type synonyms. Fir currently doesn’t have type synonyms because I don’t think they’re that useful when we have value types, and I hate to deal with them in the type checker.9 In our Stmt example above, we’ll want to write:
# Extensions for type checking.
alias TcLetStmtExts = row(inferredBinderType: Option[Ty])
alias TcForStmtExts = row(inferredIteratorType: Option[Ty])
alias TcLetStmt = LetStmt[TcLetStmtExts]
alias TcForStmt = ForStmt[TcForStmtExts]
alias TcStmt = Stmt[TcLetStmtExts, TcForStmtExts]
...
# Extensions for formatting.
alias FmtLetStmtExts = row()
alias FmtForStmtExts = row()
alias FmtLetStmt = LetStmt[FmtLetStmtExts]
alias FmtForStmt = ForStmt[FmtForStmtExts]
alias FmtStmt = Stmt[FmtLetStmtExts, FmtForStmtExts]
...
And then with a feature similar to type families, we can have one type for each use site (type checker, formatter, …) and map that one type to extension types for each of the rows and reduce number of type parameters. (there will always be at least one type parameter in extended types)
I’m not aware of any other languages that apply row extensions to named types, which is the reason why I wanted to write this post.
The main challenge for this feature to be useful is the deriving support. The macros will have to run during monomorphisation to make use of the extra fields and constructors. The generated code will then be type checked in a different language (monomorphic AST instead of the front-end AST), which can lead to things like: code that normally doesn’t type check, but does type check when generated in a macro, as macro expansion is type checked differently. While I can’t imagine how this could happen today, that doesn’t mean it won’t, and it’s best if we just don’t open the door to this kind of thing.
See also my blog post from 2020 that touches some of the same points.↩︎
Variants are value (unboxed) types, so they’re not heap allocated, and refinement just moves fields around. In general, pattern matching should never allocate, and this currently holds in Fir.↩︎
In short, all anonymous types are values in Fir. For named types the user decides whether to box or not.↩︎
This only works in a prototype that currently lives in the extensible_named_types branch. Online interpreter does not have this feature yet.↩︎
I know I failed to articulate it at the time, but I think polymorphism without requiring type parameters is the main advantage of subtyping compared to parametric polymorphism, and I think it’s the killer feature of OOP (as I define in the post). I want to get back to this point in a later blog post.↩︎
We already support #[derive(...)]s today, but they’re a part of the self-hosted compiler (not libraries), and I’m not sure if we want to keep them or do it another way. I needed to derive implementations quickly and didn’t have time to consider alternatives too much.↩︎
Yes, I also had to add row extensions to function types for this.↩︎
I think duplicating fields should be OK.↩︎
We don’t want to eagerly expand type synonyms to their RHSs because then error messages refer to the RHSs rather than synonyms, and keeping type synonyms around as we type check means we have to remember to look through them in many places. It’s a minor thing but considering how useful they are (very little, at least until this feature) it just seemed like they’re not worth it.↩︎