January 23, 2023 - Tagged as: en, plt.
I like anonymous records and row polymorphism, but until recently I didn’t know how to generate efficient code for polymorphic record access. In this blog post I will summarize the different compilations of polymorphic record accesses that I’m aware of.
All of the ideas shown in this post can be used to access a record field when the record’s concrete type is not known, but the type system guarantees that it has the accessed field. This includes row polymorphism and record subtyping.
Most of the ideas also work when the record’s type is completely unknown and it may not have the accessed field, but some of the optimizations assume accesses cannot fail. Those optimizations can only be used on statically-typed but polymorphic records.
In some of the examples below I will use row polymorphism.
In this blog post we are interested in a specific application of row polymorphism to records. In short, row polymorphism allows type variables denoting sets of record fields, with their types. For example:
: ∀ r . { x : Int, y : Int | r } -> Int
f = a.x + a.y f a
Here the type variable r
ranges over set of rows (or records). This function accepts any record as argument as long as the record has at least x : Int
and y : Int
fields.
The main difference between row polymorphism and record subtyping is that the type variable r
can be used in the right-hand side of an arrow as well, allowing passing the record around without losing its concrete type. For example:
: ∀ r . { a : Int, b : Int | r } -> (Int -> Int) -> { a : Int, b : Int | r }
mapAB = { a = f r.a, b = f r.b, .. r } mapAB r f
This function takes any record that has a : Int
and b : Int
fields, and returns a new record with updated a
and b
fields and the rest of the fields. If I pass it a record with type { a : Int, b : Int, name : String }
I get the same type back.
With subtyping, type of this function would look like:
: { a : Int, b : Int } -> (Int -> Int) -> { a : Int, b : Int } mapAB
In this version the return type just has a
and b
fields. Rest of the fields are lost. If I pass this a { a : Int, b : Int, name : String }
I get { a : Int, b : Int }
back. The name
field is lost.
Without subtyping, when the record type in a field access expression is known, it’s easy to generate efficient code: we use the same offsets used when compiling a record literal with the type.
With subtyping, and with row-polymorphism when the record type is not a concrete record type but is a record type with a row variable, type of r
in r.a
does not immediately give us where in the record’s payload the field a
is.
Let’s look at how we might go about implementing record field access in these cases.
I don’t think this idea is used in statically-typed languages, but I wanted to include it for completeness.
We can implement records as maps with string keys. Field access then becomes a map lookup.
This is easy to implement because our language probably already has a map implementation in the standard library.
The disadvantages are:
Depending on the map implementation, every field access require a O(N)
or O(log(N))
map lookup.
Map entries will be stored in a separate memory location (instead of in the record object’s payload), which will require pointer chasing to read the field value.
Unnecessary memory overhead caused by map fields that are not really necessary for records: such as the capacity
and size
fields.
With whole-program compilation, we can improve the constant factors a bit by mapping labels (field names) in the program to unique integers. This way lookups don’t require string hashing or comparison, but this is still slow and memory-inefficient compared to other techniques we will discuss below.
If you’re familiar with Haskell, this is the Haskell way of implementing row polymorphic records.
The idea is that when we pass a record to a row-polymorphic function, we also pass, implicitly, and as functions, the accessors that the function needs.
In Haskell, type of mapAB
we’ve seen above would look like this:
: ∀ r . (HasField r 'A Int, HasField r 'B Int) => Record r -> (Int -> Int) -> Record r mapAB
The runtime values for HasField ...
constraints are the accessors. When calling this function we don’t explicitly pass these accessors, the compiler generates them. In a well-typed program, we either have these values in the call site, or we know how to generate them (e.g. the record type is concrete in the call site), so it’s possible for the compiler to generate and pass these arguments.
The main advantage of this approach is that it doesn’t require any language support specifically for records.
The main disadvantages are:
Every field access is a function call.
Parameter passing per field per record does not scale well and causes messy and slow generated code. For example, suppose we want to take two records with fields x : Int
and y : Int
:
: ∀ r . (HasField r 'X Int, HasField r 'Y Int) => Record r -> Record r -> ... f
This function takes two implicit arguments, but it has a limitation that the record arguments need to have the same record types. I can’t call this function with two different records:
= 123, y = 456, a = "hi" } { x = 0, y = -1, b = false } f { x
For this to work I need two row variables:
: ∀ r1 r2 .
f HasField r1 'X Int, HasField r1 'Y Int,
(HasField r2 'X Int, HasField r2 'Y Int) =>
Record r1 -> Record r2 -> ...
This version works, but it also takes 4 implicit arguments.
Starting with the next approach, we will require mapping labels (field names) to integers in compile-time, to be used as indices.
Because these integers for labels will be used in record allocation and field accesses, it is possible that a label we see later in a program will cause different code generation for a record field access that we’ve already seen.
We have two options:
We can avoid this problem with a whole-program pass to collect all labels in the program.
This is trivial with a whole-program compiler as a front-end pass can store all labels seen in a component (library, module) somewhere and we can map those labels to integers before code generation.
We can have a link-time step to update record allocation and field access code with the integers for the labels.
In the rest of the post, labels will always get integers based on their lexicographical order and we will call these integers for labels just “labels”.
For example, if I have labels a
, c
, b
, d
in my program, their numbers will be 1, 3, 2, 4, respectively.
With integers as labels we can add a table to every record (records with the same set of keys sharing the same table) mapping labels in the program to offsets in the record’s payload. For example, the table for a record with fields a
and c
when the program has labels a
, b
, c
, d
, looks like this:
[ 0, _, 1, _ ]
This table is indexed by the label and the value gives the offset in the record’s payload for the field. _
means the record does not have the field. In a well-typed program we won’t ever see a _
value being read from a table.
This approach is quite wasteful as every table will have as many entries as number of labels in the program, but we will compress these tables below to reasonable sizes.
We will call these tables “record offset tables” or “offset tables” in short. When compiling a record access we need to get the record’s offset table. For this we add an extra word (pointer) to record objects pointing to their offset tables. We then generate this code for a record field access:
record[record[OFFSET_TABLE_INDEX][label]]
OFFSET_TABLE_INDEX
is the constant for where the offset table pointer is in record objects.
Offset tables are generated per record shape (set of labels), so the total number of tables shouldn’t be too large.
Since the _
entries won’t ever be used, we can shrink the tables with trailing _
entries. In our example above with a record with a
and c
fields, the last _
entry can be omitted:
[ 0, _, 1 ]
Because offset tables are per-shape, and the total number of record shapes in a program should be small, if we allocate a few bits in record object headers for the “shape index” of the record, this index can be used to index a global table mapping record shapes to their offset tables.
Generated code for record access expressions will look like:
record[RECORD_OFFSET_TABLES[getRecordShapeId(record)][label]]
getRecordShapeId
will read the bits in the object header for the record shape id. Depending on the actual header layout, it will look something like:
int getRecordShapeId(Object* object) {
return (object->header & RECORD_ID_MASK) >> HEADER_BITS;
}
With record shape IDs in headers and a global table mapping shape IDs to offset tables, we no longer need an extra word in record objects for the offset table pointer.
Here’s an example of offset tables when we have labels a
, b
, x
, y
, and two records 0: {a, b}
and 1: {x, y}
:
RECORD_0_OFFSET_TABLE = [
0, // label a
1, // label b
_, // label x
_, // label y
];
RECORD_1_OFFSET_TABLE = [
_, // label a
_, // label b
0, // label x
1, // label y
];
RECORD_OFFSET_TABLES = [
RECORD_0_OFFSET_TABLE, // record 0
RECORD_1_OFFSET_TABLE, // record 1
];
As before, the offset table for record 0 can be shrunk as:
RECORD_0_OFFSET_TABLE = [
0, // label a
1, // label b
];
Labels that are not used in the same record program can be given the same ID.
In the example above, this allows us to have a single table for both records:
RECORD_0_1_OFFSET_TABLE = [
0, // label a or x
1, // label b or y
];
RECORD_OFFSET_TABLES = [
RECORD_0_1_OFFSET_TABLE, // record 0
RECORD_0_1_OFFSET_TABLE, // record 1
];
The problem of assigning IDs to labels is very similar to stack allocation when spilling during register allocation. We have practically infinite amount of IDs (stack space), but we want to reuse the same ID for labels as long as they’re never used in the same record (live at the same time).
After sharing label IDs, some of the shapes may be identical, as in our example. We can give those shapes the same ID and avoid redundant entries in the offset tables.
With this, our example with two records {a, b}
and {x, y}
compiles to just one offset table:
RECORD_0_1_OFFSET_TABLE = [
0, // label a or x
1, // label b or y
];
RECORD_OFFSET_TABLES = [
RECORD_0_1_OFFSET_TABLE, // record 0 and 1
];
Suppose we have these record shapes in a program:
{a, b, q}
{x, y, q}
The RECORD_OFFSET_TABLES
table is currently an array of pointers, and indexing the offset table still requires pointer chasing.
To avoid pointer chasing we can flatten the table.
For our current program, the tables, without flattening, look like this:
RECORD_0_OFFSET_TABLE = [
0, // label a
1, // label b
_, // label x
_, // label y
2, // label q
];
RECORD_1_OFFSET_TABLE = [
_, // label a
_, // label b
0, // label x
1, // label y
2, // label q
];
RECORD_OFFSET_TABLES = [
RECORD_0_OFFSET_TABLE,
RECORD_1_OFFSET_TABLE,
];
We can flatten this as:
RECORD_0_OFFSET_TABLE = [
0, // label a
1, // label b
_, // label x
_, // label y
2, // label q
];
RECORD_1_OFFSET_TABLE = [
_, // label a
_, // label b
0, // label x
1, // label y
2, // label q
];
RECORD_LABEL_OFFSETS = [
0, // record 0, label a
1, // record 0, label b
_, // record 0, label x
_, // record 0, label y
2, // record 0, label z
_, // record 1, label a
_, // record 1, label b
0, // record 1, label x
1, // record 1, label y
2, // record 1, label z
];
Field indexing then becomes:
record[RECORD_LABEL_OFFSETS[(getRecordShapeId(record) * NUM_LABELS) + label]]
With this version we eliminate one layer of indirection.
The idea here is not too important on its own, but it will enable further improvements.
The NUM_LABELS
factor in field access code above can be eliminated by incrementing record shape IDs by NUM_LABELS
instead of 1. In our example, instead of having record IDs 0 and 1, we will have 0 and 5 (incremented by the number of labels in the program).
Since there may be large number of labels in a program and we may have only a few bits to store the record IDs, an alternative would be to convert the table to label-major order like this:
RECORD_LABEL_OFFSETS = [
0, // label a, record 0
_, // label a, record 1
1, // label b, record 0
_, // label b, record 1
_, // label x, record 0
1, // label x, record 1
_, // label y, record 0
2, // label y, record 1
3, // label z, record 0
3, // label z, record 1
];
With this table, indexing code becomes:
record[RECORD_LABEL_OFFSETS[(label * NUM_RECORDS) + getRecordShapeId(record)]]
We can then eliminate the NUM_RECORDS
factor the same way, by incrementing label IDs by NUM_RECORDS
instead of 1, and index with:
record[RECORD_LABEL_OFFSETS[label + getRecordShapeId(record)]]
Now that the table index of a label is label + shape_id
and we have a single table, we can shift the entries in the table by decrementing label IDs.
For this it doesn’t matter whether we store in label-major or record-major order. Which one of these will generate a smaller table will probably depend on the program. As an example, suppose we store the table in label-major order, and we have these records in the program:
0: {x, y, z, t}
1: {x, y}
2: {z, t}
The table will look like:
[ 0, 0, _, // label x
1, 1, _, // label y
2, _, 0, // label z
3, _, 1 ] // label t
Record IDs will be 0, 1, 2, and label IDs will be 0, 3, 6, 9.
We can use the unused slot for label x, record 2, by decrementing the label index for y
by one. If we then do the same for z
, the label IDs become 0, 2, 4, 7, and the table becomes:
[ 0, 0, // label x
1, 1, // label y
2, _, 0, // label z
3, _, 1 ] // label t
This idea can be used to fill any gaps in previous label rows, as long as the used slots in a row fits into the gaps. For example, if we have a table like:
[ 0, _, _, 1, // label x
_, 0, 1, _, // label y
... ]
We can decrement y
’s ID to fit it into the row for label x
:
[ 0, 0, 1, 1, // label x and y, interleaved
... ]
Collecting and numbering all labels in the program allows using a global table for mapping labels to offsets.
These offset tables can be made smaller by
The result is a very compact representation of record objects (no extra words in the header or unused space in the payload needed) and a fast polymorphic field access.
The offset table should also be small in practice, because different parts of the program will probably use disjoint set of names, and different labels and records will have the same IDs. In the remaining cases, tweaking label IDs to compact the table should help.
I’ve learned about the global table approach and some of the optimizations from the Dart compiler, which implements virtual calls using a “global dispatch table” (GDT), indexed by classID + methodID
in call sites. See “Introduction to Dart VM” for a description of how Dart AOT and JIT generate GDTs.
If you are interested in seeing some code, here is where we generate the GDT in dart2wasm (Dart’s Wasm backend). The outer loop finds a selector ID (label ID in our examples) for a row (list of records in our examples, list of classes in dart2wasm). The inner loop do { ... } while (!fits)
starts from the first row with gaps, and tries to fit the current row into the gaps. In the worst case it skips all of the rows, in which case rest of the code appends the table with the new row.
Dart will soon have records, and for the dart2wasm implementation of records I’m thinking of using some of the ideas described in this post. Dart records do not support width subtyping (you can’t pass {x, y, z}
where {x, y}
is expected), but because of the dynamic
type, we can have a dynamically typed record that we index.
Thanks to José Manuel Calderón Trilla for his feedback on a draft of this blog post.