osa1.net - All posts

My thoughts on OCaml

2023-04-24T00:00:00Z

Since 2013 I’ve had the chance to use OCaml a few times in different jobs, and I got frustrated and disappointed every time I had to use it. I just don’t enjoy writing OCaml.

In this post I want to summarize some of the reasons why I don’t like OCaml and why I wouldn’t choose it for a new project today.

No standard and easy way of implementing interfaces

To me it’s absolutely essential that the language should have some way of defining interfaces, implementing those interfaces for the types, and programming against those interfaces.

In Haskell, this is done with typeclasses. Rust has a similar mechanism called traits. In languages with classes this is often done with abstract classes and “implementing” those classes in new classes (e.g. implements in Dart).

In OCaml there’s no way to do this. I have to explicitly pass functions along with my values, maybe in a product type, or with a functor, or as an argument.

Regardless of how I work around this limitation, it’s extremely inconvenient. Things that must be trivial in any code base, such as converting a value to a string for debugging purposes, become a chore, and sometimes even impossible.

As far as I know, there was at least one attempt at ameliorating this with modular implicits (implicit parameter passing), but I don’t know what happened to it since 2017. It looks like it’s still not a part of the language and the standard library is not using it.

Bad standard library

OCaml’s standard library is just bizarre. It has lots of small issues, and a few larger ones. It’s really just extremely painful to use.

Some examples of the issues:

Zoo of printing/debugging and conversion functions such as string_of_int, string_of_float, print_char, Int64.of_int, string_of_int, …
Overly polymorphic operators with type 'a -> 'a -> bool such as = (called “structural equality”, throws an exception if you pass a function) and >. Code that uses these operators will probably not work on user-defined types as expected.
Standard types are sometimes persistent, sometimes mutable. List, Map, and Set are persistent. Stack and Hashtbl are mutable.
Inconsistent naming:
- Length function for Map is cardinal, length function for Hashtbl is length.
- The “bytes” type is Bytes.t, the big int type is Big_int.big_int (instead of Big_int.t). The functions in these modules are also inconsistently named. Big_int functions are suffixed with _big_int, Bytes module functions are not prefixed or suffixed.
The regex module uses global state: string_match runs a regex and sets some global state. matched_string returns the last matched string using the global state.
Lack of widely used operations such as popcount for integer types, unicode character operations.
It doesn’t have proper string and character types: String is a byte array, char is a byte.

The bad state of OCaml’s standard library also causes fragmentation in the ecosystem with two competing alternatives: Core and Batteries.

Syntax problems

OCaml doesn’t have a single-line comment syntax.

The expression syntax has just too many issues. It’s inconsistent in how it uses delimiters. for and while end with end, but let, if, match, and try don’t, even though the right-most non-terminal is the same in all of these productions:

expr ::= ...
      | while  do  done
      | for  =  ( to | downto )  do  done
      | let  in 
      | if  then  [ else  ]
      | match  with (|  [ when  ] -> )+
      | try  with (|  [ when  ] -> )+
      ...

It has for and while, but no break and continue. So you use exceptions with a try inside the loop for continue, and outside for break.

It also has lots of ambiguities, and some of these ambiguities are resolved in an unintuitive way. In addition to making OCaml difficult to parse correctly, this can actually cause incorrect reading of the code.

Most common example is probably nesting match and try expressions:

match e0 with
| p1 -> try e1 with p2 -> e2
| p3 -> e3

Here p3 -> e3 is a part of the try expression.

Another example is the sequencing syntax ; and productions with as the right-most symbol:

let test1 b =
  if b then
    print_string "1"
  else
    print_string "2"; print_string "3"

Here print_string "3" is not a part of the if expression, so this function always prints “3”.

However, even though match also has as the right-most symbol, it has different precedence in comparison to semicolon:

let test2 b =
  match b with
  | true -> print_string "1"
  | false -> print_string "2"; print_string "3"

Here print_string "3" is a part of the false -> ... branch.

Try to guess how these functions are parsed:

(* Is the last print part of `else` or not? *)
let test3 b =
  if b then
    print_string "1"
  else
    let x = "2" in
    print_string x;
    print_string "3"

(* Is this well-typed? *)
let test4 b =
  if b then
    1, 2
  else
    3, 4

(* Is the type of this `(int * int) array -> unit` or `int array -> unit * int`? *)
let test5 a = a.(0) <- 1, 2

(* What if I replace `,` with `;`? Does this set the element 1 or 2? *)
let test6 a = a.(0) <- 1; 2

When writing OCaml you have to keep these rules in mind.

It also has the “dangling else” problem:

(* Is `else` part of the inner `if` or the outer? *)
if e1 then if e2 then e3 else e4

Finally, and I think this is probably the most strange thing about OCaml’s syntax and I’m not even sure what’s exactly happening here (I can’t find anything relevant in the language documentation), comments in OCaml are somehow tokenized and those tokens need to be terminated. They can be terminated inside another comment, or even outside. This is a bit difficult to explain but here’s a simple example:

(* " *)
print_string "hi"

OCaml 5.0.0 rejects this program with this error:

File "./test.ml", line 2, characters 16-17:
2 | print_string "hi"
                    ^
  String literal begins here

From the error message it seems like the " in the comment line actually starts a string literal, which is terminated in the first quote of "hi". The closing double quote of "hi" thus starts another string literal, which is not terminated.

However that doesn’t explain why this works:

(* " *)
print_string "hi"
(* " *)
print_string "bye"

If my explanation of the previous version were correct this would fail with an unbound hi variable, but it works and prints “bye”!

Rest of the package is also not that good

I’m not following developments in OCaml ecosystem too closely, but just two years ago it was common to use Makefiles to build OCaml projects. The language server barely worked on a project with less than 50 kloc. There was no standard way of doing compile-time metaprogramming and some projects even used the C preprocessor (cpp).

Some of these things probably improved in the meantime, but the overall package is still not good enough compared to the alternatives.

But at least it’s a functional language?

Almost all modern statically typed languages have closures, higher-order functions/methods, lazy streams, and combinators that run efficiently. Persistent/immutable data structures can be implemented even in C.

Also, OCaml has no tracking of side-effects (like in Haskell), and the language and the standard library have lots of features and functions with mutation, such as the array update syntax, mutable record fields, Hashtbl, and the regex module.

The only thing that makes OCaml more “functional” than e.g. Dart, Java, or Rust is that it supports tail calls. While having tail calls is important for functional programming, I would happily give up on tail calls if that means not having the problems listed above.

Also keep in mind that when you mix imperative and functional styles tail calls become less important. For example, I don’t have to implement a stream map function in Dart with a tail call to map the rest of the stream, I can just use a while or for loop.

When should I use it?

In my opinion there is no reason to use OCaml in a new project in 2023. If you have a reason to think that OCaml is the best choice for a new project please let me know your use case, I’m genuinely curious.

Fast polymorphic record access

2023-01-23T00:00:00Z

I like anonymous records and row polymorphism, but until recently I didn’t know how to generate efficient code for polymorphic record access. In this blog post I will summarize the different compilations of polymorphic record accesses that I’m aware of.

All of the ideas shown in this post can be used to access a record field when the record’s concrete type is not known, but the type system guarantees that it has the accessed field. This includes row polymorphism and record subtyping.

Most of the ideas also work when the record’s type is completely unknown and it may not have the accessed field, but some of the optimizations assume accesses cannot fail. Those optimizations can only be used on statically-typed but polymorphic records.

In some of the examples below I will use row polymorphism.

Row polymorphism and record subtyping, briefly

In this blog post we are interested in a specific application of row polymorphism to records. In short, row polymorphism allows type variables denoting sets of record fields, with their types. For example:

f : ∀ r . { x : Int, y : Int | r } -> Int
f a = a.x + a.y

Here the type variable r ranges over set of rows (or records). This function accepts any record as argument as long as the record has at least x : Int and y : Int fields.

The main difference between row polymorphism and record subtyping is that the type variable r can be used in the right-hand side of an arrow as well, allowing passing the record around without losing its concrete type. For example:

mapAB : ∀ r . { a : Int, b : Int | r } -> (Int -> Int) -> { a : Int, b : Int | r }
mapAB r f = { a = f r.a, b = f r.b, .. r }

This function takes any record that has a : Int and b : Int fields, and returns a new record with updated a and b fields and the rest of the fields. If I pass it a record with type { a : Int, b : Int, name : String } I get the same type back.

With subtyping, type of this function would look like:

mapAB : { a : Int, b : Int } -> (Int -> Int) -> { a : Int, b : Int }

In this version the return type just has a and b fields. Rest of the fields are lost. If I pass this a { a : Int, b : Int, name : String } I get { a : Int, b : Int } back. The name field is lost.

Without subtyping, when the record type in a field access expression is known, it’s easy to generate efficient code: we use the same offsets used when compiling a record literal with the type.

With subtyping, and with row-polymorphism when the record type is not a concrete record type but is a record type with a row variable, type of r in r.a does not immediately give us where in the record’s payload the field a is.

Let’s look at how we might go about implementing record field access in these cases.

(0) Records as maps

I don’t think this idea is used in statically-typed languages, but I wanted to include it for completeness.

We can implement records as maps with string keys. Field access then becomes a map lookup.

This is easy to implement because our language probably already has a map implementation in the standard library.

The disadvantages are:

Depending on the map implementation, every field access require a O(N) or O(log(N)) map lookup.
Map entries will be stored in a separate memory location (instead of in the record object’s payload), which will require pointer chasing to read the field value.
Unnecessary memory overhead caused by map fields that are not really necessary for records: such as the capacity and size fields.

With whole-program compilation, we can improve the constant factors a bit by mapping labels (field names) in the program to unique integers. This way lookups don’t require string hashing or comparison, but this is still slow and memory-inefficient compared to other techniques we will discuss below.

(1) Passing accessors as parameters

If you’re familiar with Haskell, this is the Haskell way of implementing row polymorphic records.

The idea is that when we pass a record to a row-polymorphic function, we also pass, implicitly, and as functions, the accessors that the function needs.

In Haskell, type of mapAB we’ve seen above would look like this:

mapAB : ∀ r . (HasField r 'A Int, HasField r 'B Int) => Record r -> (Int -> Int) -> Record r

The runtime values for HasField ... constraints are the accessors. When calling this function we don’t explicitly pass these accessors, the compiler generates them. In a well-typed program, we either have these values in the call site, or we know how to generate them (e.g. the record type is concrete in the call site), so it’s possible for the compiler to generate and pass these arguments.

The main advantage of this approach is that it doesn’t require any language support specifically for records.

The main disadvantages are:

Every field access is a function call.
Parameter passing per field per record does not scale well and causes messy and slow generated code. For example, suppose we want to take two records with fields x : Int and y : Int:
```
f : ∀ r . (HasField r 'X Int, HasField r 'Y Int) => Record r -> Record r -> ...
```
This function takes two implicit arguments, but it has a limitation that the record arguments need to have the same record types. I can’t call this function with two different records:
```
f { x = 123, y = 456, a = "hi" } { x = 0, y = -1, b = false }
```
For this to work I need two row variables:
```
f : ∀ r1 r2 .
    (HasField r1 'X Int, HasField r1 'Y Int,
     HasField r2 'X Int, HasField r2 'Y Int) =>
    Record r1 -> Record r2 -> ...
```
This version works, but it also takes 4 implicit arguments.

Prerequisite: integers for labels

Starting with the next approach, we will require mapping labels (field names) to integers in compile-time, to be used as indices.

Because these integers for labels will be used in record allocation and field accesses, it is possible that a label we see later in a program will cause different code generation for a record field access that we’ve already seen.

We have two options:

We can avoid this problem with a whole-program pass to collect all labels in the program.

This is trivial with a whole-program compiler as a front-end pass can store all labels seen in a component (library, module) somewhere and we can map those labels to integers before code generation.
We can have a link-time step to update record allocation and field access code with the integers for the labels.

In the rest of the post, labels will always get integers based on their lexicographical order and we will call these integers for labels just “labels”.

For example, if I have labels a, c, b, d in my program, their numbers will be 1, 3, 2, 4, respectively.

(2) Per-record label-to-field-offset tables

With integers as labels we can add a table to every record (records with the same set of keys sharing the same table) mapping labels in the program to offsets in the record’s payload. For example, the table for a record with fields a and c when the program has labels a, b, c, d, looks like this:

[ 0, _, 1, _ ]

This table is indexed by the label and the value gives the offset in the record’s payload for the field. _ means the record does not have the field. In a well-typed program we won’t ever see a _ value being read from a table.

This approach is quite wasteful as every table will have as many entries as number of labels in the program, but we will compress these tables below to reasonable sizes.

We will call these tables “record offset tables” or “offset tables” in short. When compiling a record access we need to get the record’s offset table. For this we add an extra word (pointer) to record objects pointing to their offset tables. We then generate this code for a record field access:

record[record[OFFSET_TABLE_INDEX][label]]

OFFSET_TABLE_INDEX is the constant for where the offset table pointer is in record objects.

Offset tables are generated per record shape (set of labels), so the total number of tables shouldn’t be too large.

Since the _ entries won’t ever be used, we can shrink the tables with trailing _ entries. In our example above with a record with a and c fields, the last _ entry can be omitted:

[ 0, _, 1 ]

(2.1) Making the tables global

Because offset tables are per-shape, and the total number of record shapes in a program should be small, if we allocate a few bits in record object headers for the “shape index” of the record, this index can be used to index a global table mapping record shapes to their offset tables.

Generated code for record access expressions will look like:

record[RECORD_OFFSET_TABLES[getRecordShapeId(record)][label]]

getRecordShapeId will read the bits in the object header for the record shape id. Depending on the actual header layout, it will look something like:

int getRecordShapeId(Object* object) {
  return (object->header & RECORD_ID_MASK) >> HEADER_BITS;
}

With record shape IDs in headers and a global table mapping shape IDs to offset tables, we no longer need an extra word in record objects for the offset table pointer.

Here’s an example of offset tables when we have labels a, b, x, y, and two records 0: {a, b} and 1: {x, y}:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
  _, // label x
  _, // label y
];

RECORD_1_OFFSET_TABLE = [
  _, // label a
  _, // label b
  0, // label x
  1, // label y
];

RECORD_OFFSET_TABLES = [
  RECORD_0_OFFSET_TABLE, // record 0
  RECORD_1_OFFSET_TABLE, // record 1
];

As before, the offset table for record 0 can be shrunk as:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
];

Labels that are not used in the same record program can be given the same ID.

In the example above, this allows us to have a single table for both records:

RECORD_0_1_OFFSET_TABLE = [
  0, // label a or x
  1, // label b or y
];

RECORD_OFFSET_TABLES = [
  RECORD_0_1_OFFSET_TABLE, // record 0
  RECORD_0_1_OFFSET_TABLE, // record 1
];

The problem of assigning IDs to labels is very similar to stack allocation when spilling during register allocation. We have practically infinite amount of IDs (stack space), but we want to reuse the same ID for labels as long as they’re never used in the same record (live at the same time).

After sharing label IDs, some of the shapes may be identical, as in our example. We can give those shapes the same ID and avoid redundant entries in the offset tables.

With this, our example with two records {a, b} and {x, y} compiles to just one offset table:

RECORD_0_1_OFFSET_TABLE = [
  0, // label a or x
  1, // label b or y
];

RECORD_OFFSET_TABLES = [
  RECORD_0_1_OFFSET_TABLE, // record 0 and 1
];

(2.3) Flattening the table

Suppose we have these record shapes in a program:

{a, b, q}
{x, y, q}

The RECORD_OFFSET_TABLES table is currently an array of pointers, and indexing the offset table still requires pointer chasing.

To avoid pointer chasing we can flatten the table.

For our current program, the tables, without flattening, look like this:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
  _, // label x
  _, // label y
  2, // label q
];

RECORD_1_OFFSET_TABLE = [
  _, // label a
  _, // label b
  0, // label x
  1, // label y
  2, // label q
];

RECORD_OFFSET_TABLES = [
  RECORD_0_OFFSET_TABLE,
  RECORD_1_OFFSET_TABLE,
];

We can flatten this as:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
  _, // label x
  _, // label y
  2, // label q
];

RECORD_1_OFFSET_TABLE = [
  _, // label a
  _, // label b
  0, // label x
  1, // label y
  2, // label q
];

RECORD_LABEL_OFFSETS = [
  0, // record 0, label a
  1, // record 0, label b
  _, // record 0, label x
  _, // record 0, label y
  2, // record 0, label z

  _, // record 1, label a
  _, // record 1, label b
  0, // record 1, label x
  1, // record 1, label y
  2, // record 1, label z
];

Field indexing then becomes:

record[RECORD_LABEL_OFFSETS[(getRecordShapeId(record) * NUM_LABELS) + label]]

With this version we eliminate one layer of indirection.

(2.4) Removing the constant factor

The idea here is not too important on its own, but it will enable further improvements.

The NUM_LABELS factor in field access code above can be eliminated by incrementing record shape IDs by NUM_LABELS instead of 1. In our example, instead of having record IDs 0 and 1, we will have 0 and 5 (incremented by the number of labels in the program).

Since there may be large number of labels in a program and we may have only a few bits to store the record IDs, an alternative would be to convert the table to label-major order like this:

RECORD_LABEL_OFFSETS = [
  0, // label a, record 0
  _, // label a, record 1

  1, // label b, record 0
  _, // label b, record 1

  _, // label x, record 0
  1, // label x, record 1

  _, // label y, record 0
  2, // label y, record 1

  3, // label z, record 0
  3, // label z, record 1
];

With this table, indexing code becomes:

record[RECORD_LABEL_OFFSETS[(label * NUM_RECORDS) + getRecordShapeId(record)]]

We can then eliminate the NUM_RECORDS factor the same way, by incrementing label IDs by NUM_RECORDS instead of 1, and index with:

record[RECORD_LABEL_OFFSETS[label + getRecordShapeId(record)]]

(2.5) Compacting the table further

Now that the table index of a label is label + shape_id and we have a single table, we can shift the entries in the table by decrementing label IDs.

For this it doesn’t matter whether we store in label-major or record-major order. Which one of these will generate a smaller table will probably depend on the program. As an example, suppose we store the table in label-major order, and we have these records in the program:

0: {x, y, z, t}
1: {x, y}
2: {z, t}

The table will look like:

[ 0, 0, _,   // label x
  1, 1, _,   // label y
  2, _, 0,   // label z
  3, _, 1 ]  // label t

Record IDs will be 0, 1, 2, and label IDs will be 0, 3, 6, 9.

We can use the unused slot for label x, record 2, by decrementing the label index for y by one. If we then do the same for z, the label IDs become 0, 2, 4, 7, and the table becomes:

[ 0, 0,      // label x
  1, 1,      // label y
  2, _, 0,   // label z
  3, _, 1 ]  // label t

This idea can be used to fill any gaps in previous label rows, as long as the used slots in a row fits into the gaps. For example, if we have a table like:

[ 0, _, _, 1,  // label x
  _, 0, 1, _,  // label y
  ... ]

We can decrement y’s ID to fit it into the row for label x:

[ 0, 0, 1, 1,  // label x and y, interleaved
  ... ]

Conclusions

Collecting and numbering all labels in the program allows using a global table for mapping labels to offsets.

These offset tables can be made smaller by

Giving same number to labels that don’t occur in the same record
Giving same ID to records that become identical after the previous step
Tweaking label numbers so that rows without overlapping entries can be merged into a single row

The result is a very compact representation of record objects (no extra words in the header or unused space in the payload needed) and a fast polymorphic field access.

The offset table should also be small in practice, because different parts of the program will probably use disjoint set of names, and different labels and records will have the same IDs. In the remaining cases, tweaking label IDs to compact the table should help.

References

I’ve learned about the global table approach and some of the optimizations from the Dart compiler, which implements virtual calls using a “global dispatch table” (GDT), indexed by classID + methodID in call sites. See “Introduction to Dart VM” for a description of how Dart AOT and JIT generate GDTs.

If you are interested in seeing some code, here is where we generate the GDT in dart2wasm (Dart’s Wasm backend). The outer loop finds a selector ID (label ID in our examples) for a row (list of records in our examples, list of classes in dart2wasm). The inner loop do { ... } while (!fits) starts from the first row with gaps, and tries to fit the current row into the gaps. In the worst case it skips all of the rows, in which case rest of the code appends the table with the new row.

Dart will soon have records, and for the dart2wasm implementation of records I’m thinking of using some of the ideas described in this post. Dart records do not support width subtyping (you can’t pass {x, y, z} where {x, y} is expected), but because of the dynamic type, we can have a dynamically typed record that we index.

Thanks to José Manuel Calderón Trilla for his feedback on a draft of this blog post.

Products and sums, named and anonymous

2021-04-10T00:00:00Z

I was recently thinking about why do so many languages have tuples, which can be thought of as simple anonymous products (more on the definition of this below), but not something similar for sums. Both sum and product types are widely used, so it seems inconsistent to have anonymous products but not sums.

I recently tweeted about this and got helpful responses that made me realize that I got my definitions wrong. As I think more about what “anonymous type” means it became clear to me that the it’s not just tuples or other types with special syntax, instead of names. It’s more complicated than that.

So in this post I’d like to briefly talk about products and sums, and how are names used in type checking. I will then show a different way of type checking, and some examples from two widely used languages. Finally, I will argue that types are called “named” or “anonymous” depending on how they are checked.

Note that I’m not using any of these words as they are used in category theory or any other field of mathematics. These are mainly how I see them used in widely used PLs like Haskell, Rust, and OCaml, and in PL papers and books.

Products

A value of a product type contains zero or more fields with potentially different types. Some example product types are:

data Coordinate = Coordinate { x :: Int, y :: Int }: a product with two Int fields
data D = D Int String Float: a product with Int, String, and Float fields
data Empty = Empty: a product with no fields

Note that the way you access the fields does not matter. In the examples above, fields of a Coordinate value can be accessed with pattern matching, or with the generated functions x and y. In the second example, we can only access the fields with pattern matching.

What matters is: products contain zero or more fields. The fields can have different types.

Sums

A sum type specifies multiple “variants” (or “alternatives”), where each variant has a “name” (or “tag”, more on this later) and some number of fields.

A value of a sum type holds a name (or tag), and the fields of the variant with that name.

For example, if you have a parser for integers, you will want to return an integer when parsing succeeds, or an error message when something goes wrong. The sum type for the return value of your parse function would look like:

data ParseResult
  = Success Int
  | Fail String

Here, Success and Fail are names of the variants. Success variant has an Int field, and Fail variant has a String field.

A value of this type does not contain an Int and String at the same time. It’s either a Fail with a String field, or a Success with an Int field.

The way you access the fields is with pattern matching:

case parse_result of
   Success int -> ...
   Fail error_message -> ...

Names in type checking (nominal typing)

If I have two types, named T1 and T2, no matter how they are defined, they are considered different in Haskell, and most other widely used typed languages (Rust, Java, …). This is called “nominal” type checking, where differently named types are considered different, even if they are “structurally” the same. For example, data T1 = T Int and data T2 = T Int are structurally the same, but you can’t apply a value of type T2 to a function that expects T1.

What “structurally same” mean is open to interpretation. We will come to this later.

In addition, all types have names¹, even types like tuples, which may look like they don’t have names, like our Coordinate or ParseResult have.

Tuples in most languages are just a bunch of product types, like the ones you can define yourself. They are often pre-defined for arities 0 to some number, and they have a special, “mixfix” syntax, with parentheses and commas to separate the fields. Other than that, they are no different than the ones you can define yourself.

You can see GHC’s definition of tuples here. In GHC, you can use the name directly if you don’t want the mixfix syntax, like (,) 1 2. So the name for an 2-ary tuple is (,) in Haskell, and it has a special syntax so you can write more readable (1, 2) (or (Int, Int) in type context). Other than syntax, there’s nothing special about tuples.

So it’s clear that most languages don’t have anonymous types. All types have some kind of names, and two types are only “compatible” if the names match.

Before defining what anonymous types are, I would like to give two examples, from PureScript and OCaml, where types are not checked based on their names, but based on their “structure”.

Structural type checking for products

A record is a product type with named (or “labelled”) fields. Our Coordinate example is a record.

In PureScript, records can be defined without giving names to them. For example:

f :: { x :: Int, y :: Int } -> Int
f a = a.x + a.y

Here, f is a function that takes a record with two Int fields, named x and y, as an argument.

Here is a more interesting version of the same function:

f :: forall r . { x :: Int, y :: Int | r } -> Int
f a = a.x + a.y

This version takes a record with at least x :: Int and y :: Int fields, but it can have more fields. Using this version, this code type checks:

f { x: 1, y: 2, z: 3, t: 4 }

The r in this type is not too important. Important part is, in PureScript, records are not type checked nominally. Indeed, in the example above, type of the record with 4 fields is not defined, and no names are used for the record in the type signature of f.

You might think that the record braces and commas are similar to the tuple syntax, so the name could be something like {,}, maybe applied to x :: Int somehow (assuming there is a type-level representation of field names).

However, even if that’s the case, type checking of these types are quite different than tuples. We’ve already seen that we can pass a record with more fields. You can also reorder fields in the function type signature², or in the record expression, and it still works.

So type checking for PureScript is quite different than Haskell tuples.

This kind of type checking where you look at the “structure” rather than just the names is called structural type checking.

Now let’s take a look at an example for sum types.

Structural type checking for sum types

OCaml has named sum types, just like Haskell’s. Here is the OCaml version of our ParseResult type:

type parse_result =
  | Success of int
  | Fail of string

Name of this type is parse_result (following OCaml naming conventions), and it is type checked exactly the same way it is type checked in Haskell.

A second way of defining sum types in OCaml, and without names, is with polymorphic variants. Here’s the polymorphic variant for the same type:

type parse_result = [ `Success of int | `Fail of string ]

Crucially, even though we use a similar syntax with the type keyword, this is a type synonym. The right-hand side of this definition is an anonymous sum with two variants, tagged `Success and `Fail, with int and string fields, respectively.

Now, suppose I have a parse result handler, which, in addition to the success and failure cases, handles some “other” case as well:

let f = function
  | `Success i -> Printf.printf "Parse result: %d\n" i
  | `Fail msg -> Printf.printf "Parse failed: %s\n" msg
  | `Other -> Printf.printf "Wat?\n"

Type of this function as inferred by the OCaml compiler is:

[< `Fail of string | `Other | `Success of x ] -> unit

What this type says is that the function accepts any polymorphic variant that has the tags Fail, Other, and Success (with the specified field types), or some subset of these tags. So if I have a value of type parse_result:

let x : parse_result = `Success 123

I can pass it to f, even though f’s argument type is not exactly parse_result. Here’s the full example, run in utop: (utop # part is the prompt, lines after ;; are utop outputs)

utop # type parse_result = [ `Success of int | `Fail of string ];;
type parse_result = [ `Fail of string | `Success of int ]

utop # let f = function
  | `Success i -> Printf.printf "Parse result: %d\n" i
  | `Fail msg -> Printf.printf "Parse failed: %s\n" msg
  | `Other -> Printf.printf "Wat?\n";;
val f : [< `Fail of string | `Other | `Success of int ] -> unit = <fun>

utop # let x : parse_result = `Success 123;;
val x : parse_result = `Success 123

utop # f x;;
Parse result: 123
- : unit = ()

Neat!

Similar to PureScript records, and unlike Haskell tuples, type checking for OCaml polymorhic records is structural, not nominal.

Names -> nominal, ??? -> structural

Now that we have seen structural type checking as an alternative to name-based (nominal) type checking, and some examples, here is my attempt at defining anonymous types: If named types are type checked nominally, then the types that are structurally type checked are called “anonymous”.

In other words:

Nominally type checked types are named
Structurally type checked types are anonymous

According to this definition, Haskell and many other languages don’t have anonymous types. PureScript records are an example to anonymous products, and OCaml polymorphic variants are an example to anonymous sums.

Conclusions

Named types are checked nominally, anonymous types are checked structurally. According to this definition, Haskell, and many other languages, don’t have anonymous types, as all types are nominally checked.

Tuples are no exception: they have names, and type checked nominally.

PureScript records and OCaml polymorphic variants are great examples to anonymous products and sums, respectively.

Thanks to @_gilmi and @madgen_ for their helpful comments on a draft of this blog post.

With the exception of type synonyms. Type synonyms can be considered as simple macros for substituting types for names before type checking.↩︎
In Haskell, reordering stuff at the type level is often done with type families (type-level functions). Types are still checked nominally, but by rearranging them before type checking you can often have something somewhat similar to structural checking.↩︎

Conditional compilation based on crate type

2020-12-24T00:00:00Z

Suppose you have a no_std crate that you want to use in two ways:

As a self-contained static library, to link with other (non-Rust) code
As a Rust library, to import from another crate to test it

(1) is the main use case for this library. (2) is because you want to test this library and you want to be able to use Rust’s std and other Rust libraries for testing.

The Rust crate type for (1) is staticlib. For (2) you need rlib. (documentation on crate types)

Here’s the problem. To be able to generate staticlib you need to implement a panic handler as otherwise the code won’t know how to panic¹. However, if you define a panic handler, you won’t be able to use your crate in other crates anymore as your panic handler will clash with the std panic handler.

4 files needed to demonstrate this:

-- Cargo.toml for the library
[package]
name = "nostd_lib"
version = "0.1.0"
authors = []
edition = "2018"

[lib]
crate-type = ["staticlib", "rlib"]

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

-- lib.rs
#![no_std]

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

-- Cargo.toml for the importing crate
[package]
name = "nostd_bin"
version = "0.1.0"
authors = []
edition = "2018"

[dependencies]
nostd_lib = { path = "../nostd_lib" }

-- main.rs
extern crate nostd_lib;

fn main() {}

The library builds fine, but if you try to build nostd_bin you’ll get this error:

error: duplicate lang item in crate `nostd_lib` (which `nostd_bin` depends on): `panic_impl`.
  |
  = note: the lang item is first defined in crate `std` (which `nostd_bin` depends on)
  = note: first definition in `std` loaded from ...
  = note: second definition in `nostd_lib` loaded from ...

Which says you now have two panic handlers: one in std and one in your library.

If you remove the panic handler in the library then you won’t be able to build the library anymore:

error: `#[panic_handler]` function required, but not found

So you need some kind of conditional compilation, to generate panic handler only when generating staticlib. Unfortunately conditional compilation based on crate type is currently not possible. It is also not possible to specify target crate type when invoking cargo.

The least hacky way I could find to solve this (and without using anything other than just cargo build to build) is by having two Cargo.toml files.

Cargo really wants manifest files to be named Cargo.toml, so we put the files in different directories. In my case the top-level one is for staticlib and it looks like this:

[package]
name = "nostd_lib"
version = "0.1.0"
authors = []
edition = "2018"

[features]
default = ["panic_handler"]
panic_handler = []

[lib]
crate-type = ["staticlib"]

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

I also update lib.rs to only define the panic handler when the feature is enabled:

#[cfg(feature = "panic_handler")]
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    ...
}

Now I can build the library at the library’s top-level with just cargo build. Because the panic_handler feature is enabled by default in this Cargo.toml, the panic handler will be defined by default with just cargo build and static library will build and work fine.

For the rlib I create a similar Cargo.toml in rlib directory:

[package]
name = "nostd_lib"
version = "0.1.0"
authors = []
edition = "2018"

[lib]
crate-type = ["rlib"]
path = "../src/lib.rs"

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

The differences are: this one only generates rlib, doesn’t define the panic_handler feature, and specifies the library source path explicitly (as it’s not in the default location relative to this Cargo.toml). It’s fine to refer to a feature that you never define in Cargo.toml in your code, so lib.rs is still fine, and the panic handler will never be built when you build the crate with this Cargo.toml.

Now in the importing crate I use this Cargo.toml instead of the top-level one:

[dependencies]
nostd_lib = { path = "../nostd_lib/rlib" }

And it works fine. The downside is I have two Cargo.toml files now, but in my case that’s not a big deal, as my Cargo.toml is quite small and have no dependencies other than libc².

I hope this is helpful. If you know any better way to do conditional compilation based on crate types, or to solve the problem of generating usable staticlib and rlibs from a single no_std crate, let me know!

You need a panic_handler even if you never panic in your crate (assuming that’s possible). For example, you can’t compile fn main() {} with no_std, panic=abort, and without a panic_handler: the compiler complains about the missing panic handler.↩︎
If you’re working on a no_std crate I think you won’t be able to find a lot of libraries that you can use anyway.↩︎

8 years of Haskell

2020-06-30T00:00:00Z

21 Jun 2020 was my last day at Well-Typed and as a GHC maintainer/developer. On 22nd I joined the programming language team at DFINITY to work on the Motoko programming language.

Here’s the summary of my 8 years writing Haskell pretty much non-stop:

In 2012 I wrote my first Haskell program, which was a chat server. I was reading “Real World Haskell” and “Learn You a Haskell for Great Good!” at the time and applying what I learned on this project.
In the same year I implemented my first programming language in Haskell. I don’t remember much about this project, I think it may be just a few extensions over the excellent Haskell tutorial “Write Yourself a Scheme in 48 hours”.
Also in 2012 I made a few commits to the programming language Fay. This was my first contribution to an open source compiler not written by me.
In 2013 I worked on four PL implementations, two of which were implemented from scratch in Haskell: A Prolog implementation and a K Lambda interpreter.

The other two projects were: A multi-stage ML-like language written in OCaml, and K Framework (in Java).
In 2014 I was accepted to Google Summer of Code to work on adding stack traces to GHCJS. The project was successful, and I made 88 commits to GHCJS during this period.

This was my first introduction to GHC. I made only one commit to GHC during this time, but I started reading the RTS and code generator to be able to implement cost-centre stacks in GHCJS, which taught me a lot.
Also in 2014, I briefly worked at a startup where I wrote Haskell.
In 2015 I joined Indiana University to do PhD in programming languages. In my first semester I worked on the paper “Efficient Communication and Collection with Compact Normal Forms” which was about a GHC extension. The paper was published the same year at ICFP.
In the same year I briefly worked on a torrent client in Haskell.
According to git logs, 2015 was the year where I started making some larger commits to GHC. I think I made a few dozen commits that year. What was happening in the background is that I was working on unboxed sums. At Haskell Implementors Workshop in 2015 my advisor gave a presentation on efficiency of data representation in Haskell. I don’t remember how the story developed, but I think we also talked to a few people at ICFP on how to improve the situation, and one of the idea that came up was unboxed sums. IIRC I started working on it soon after returning from ICFP.

The first somewhat working version was implemented as a plugin, using lots of unsafe coercions under the hood. It was good enough to run some examples.
(In 2015, I also studied various metaprogramming and partial evaluation ideas quite extensively. If you look at my blog posts published in 2015 you’ll see a lot of related blog posts. There are also a few related git repositories in my Github page. I also gave a related talk at HIW 2015.)
Early 2016, I don’t remember what I was doing in too much detail. I remember taking an advanced OS class around that time and enjoying it very much. This was also the time where I started to realize that the tools I’m using (mostly GHC) are full of bugs, and very inefficient. I kept studying program transformation ideas, with the goal of making Haskell “fast”. I also started using C more, partly for the OS class, but also in my hobby projects. For example, the first commit of tiny was made in January 2016 and the code was in C.
In mid-2016 I left Bloomington for Cambridge, UK, for an internship at Microsoft Research with SPJ. We mainly worked on implementing unboxed sums properly in the compiler (instead of as a hacky plugin), but I also did a lot of GHC maintenance work there with supervision of SPJ.

Unboxed sums was merged during my time at MSR.

In the rest of the internship I did a lot of reading, did GHC maintenance, and biked around Cambridge.
Most importantly, during my time at MSR I realized that I’m no longer interested in academic research. I don’t enjoy writing papers. I don’t feel like pushing a field forward while most of the tools I use every day are badly broken, inefficient, usually both. I started having job interviews while I was in the UK. I visited two companies for interviews, one in London, another one in Cambridge.

I also emailed my advisor, saying that I don’t want to come back to Bloomington.
Job interviews went badly, and I was back at Indiana University. Rest of 2016 was pretty horrible. I was depressed. I had no interest in research. I still helped publishing a paper, but I did not enjoy the process.

I still spent my last semester somewhat productively. I took enough classes this semester to leave IU with a masters degree, instead of empty handed (I was a PhD student, not masters). I also had some good job interviews and met good people from the Haskell community.

By the end of 2016 I accepted a job offer and left IU with masters degree to write Haskell for a startup.
In 2017 I worked for this startup for a year. I wrote lots of networking and concurrent code, and learned a lot about these topics and exception handling in Haskell. Until this my Haskell experience was mainly in the context of compilers, so this was quite educational for me.

I left the company at the end of that year to join Well-Typed to work on GHC full-time.
My time at Well-Typed was great, but also full of challenges, mainly related to working remotely.

I worked on GHC between 30 and 40 hours a week (some weeks as little as 24 hours, but no less than that). Few weeks after I joined I started working on a new garbage collector with a colleague. When I joined the project there were only type definitions in header files, and almost no code. I implemented the first sequential prototype of the new collector. After that we started collaborating more closely with my colleague while implementing the concurrent version. We found many bugs in both the design and implementation, and sorted out many edge cases during this time. I thoroughly enjoyed working on this project, even though it was clearly the most challenging project I ever worked on.

After the garbage collector I kept working as a maintainer until I left the company on a Sunday, Jun 21st, 2020. I made my last commit to a merge request that I was working on 21st.
On 22 Jun 2020 I joined DFINITY to work on the Motoko programming language, and this is where the story ends.

At the time of this writing I have 383 commits to GHC and I’m the 14th contributor with most commits. It feels bad to leave a project that I liked and contributed so much, but it’s also the right thing to do. After the GC was merged I started spending my time less and less productively, for many reasons, and I had lost my motivation to improve Haskell-the-language and GHC. Perhaps I can write more about these in another post.

gdb breakpoints with conditions on backtrace

2020-04-25T00:00:00Z

Being able so specify conditions in gdb breakpoints is quite useful. For example, if I’m interested in mmap(NULL, ...) calls I can do

break mmap if addr == 0

and gdb doesn’t break on mmap when the addr == 0 condition doesn’t hold.

I’ve used this many times to great effect, but it’s not always sufficient, sometimes I need to break not when a variable or argument has a specific value but the function is called (directly or indirectly) from another function. For example, when debugging a GHC RTS issue I sometimes want to inspect mmap calls made by the garbage collector.

As far as I know this is not possible using the standard break syntax, but gdb provides a Python API that allows setting breakpoints with conditions implemented in Python. Using this API it’s takes a few lines to implement this:

class FrameBp(gdb.Breakpoint):
    def __init__(self, spec, *args, frame=None, **kwargs):
        self.frame = frame
        super(FrameBp, self).__init__(spec, *args, **kwargs)

    def stop (self):
        frame = gdb.selected_frame().older()

        while frame:
            if frame.name() == self.frame:
                return True

            frame = frame.older()

        return False

When calling the constructor the first argument is the breakpoint specifier, which is basically the part after break ... in gdb’s break command. The frame argument is the function we look for before actually breaking. We only break if the function exists in the backtrace. Here’s an example use:

>>> python FrameBp("mmap", frame="GarbageCollect")
Breakpoint 1 at 0x7f3366243f00: file ../sysdeps/unix/sysv/linux/mmap64.c, line 44.

This will only break on mmap if the backtrace has GarbageCollect at some point. An example backtrace when the breakpoint is hit:

Breakpoint 1, __GI___mmap64 (addr=0x4200200000, len=1048576, prot=3, flags=50, fd=-1, offset=0) at ../sysdeps/unix/sysv/linux/mmap64.c:44
44        if (offset & MMAP_OFF_MASK)

>>> bt
#0  __GI___mmap64 (addr=0x4200200000, len=1048576, prot=3, flags=50, fd=-1, offset=0) at ../sysdeps/unix/sysv/linux/mmap64.c:44

...

#19 0x0000000003022c83 in GarbageCollect (collect_gen=0, do_heap_census=false, deadlock_detect=false, gc_type=0, cap=0x37ef500
, idle_cap=0x0) at rts/sm/GC.c:449

...

With some effort you could probably turn this into a proper gdb command and run it without the python ... part, but so far this works good enough for me.

New blog post published on Well-Typed's blog

2020-03-25T00:00:00Z

I recently published a new post on Well-Typed’s blog: “The problem with adding functions to compact regions”.

It’s also shared on Twitter and /r/haskell. If you have any questions/comments feel free to ping me in any of these places, or add a comment below!

Knot-tying: two more examples, and an alternative

2020-02-27T00:00:00Z

In the previous post we’ve looked at a representation of expressions in a programming language, what the representation makes easy and where we have to use knot-tying.

In this post I’m going to give two more examples, using the same expression representation from the previous post, and then talk about how to implement our passes using a different representation, without knot-tying.

Example: attaching typing information to Ids

Previously we attached arity and unfolding information to Ids. Now suppose that our language is typed, and up to some point our transformations rely on typing information. Similar to arity and unfolding fields we add one more field to Id:

data Id = Id
  { ..
  , idType :: Maybe Type
  }

The Maybe part is because when we no longer need the types we want to be able to clear the type fields to make the AST smaller. While we have only one heap object per Id, in an average program there’s still a lot of different Ids, and Type representation can get quite large, so this is worthwhile. This makes the working set smaller, which causes less GC work and improves compiler performance.

In our cyclic AST representation the only way to implement this without losing sharing is with a full-pass over the entire program, using knot-tying. The code is similar to the ones in the previous post.

Example: attaching unfoldings to Ids

Remember that in the previous post we represented the AST as:

data Expr
  = IdE Id
  | IntE Int
  | Lam Id Expr
  | App Expr Expr
  | IfE Expr Expr Expr
  | Let Id Expr Expr

data Id = Id
  { idName :: String
    -- ^ Unique name of the identifier
  , idArity :: Int
    -- ^ Arity of a lambda. 0 for non-lambdas.
  , idUnfolding :: Maybe Expr
    -- ^ RHS of a binder, used for inlining
  }

In this representation if I have a recursive definition like

let fac = \x . if x then x * fac (x - 1) else 1 in fac 5

In fac used in lambda body I want to be able to do idUnfolding and get the definition of this lambda. So the lambda refers to the Id for fac, and fac refers to the lambda in its idUnfolding field, forming a cycle.

In this representation only way to implement this is with knot-tying. An implementation that maintains a map from binders to their RHSs to update unfoldings of Ids in occurrence position does not work, because when we update an occurrence of the binder in its own RHS (i.e. in a recursive let) we end up invalidating the RHS that we’ve added to the map.

Here’s a knot-tying implementation that adds unfoldings (only the interesting bits):

addUnfoldings :: Expr -> Expr
addUnfoldings = go M.empty
  where
    go :: M.Map String Id -> Expr -> Expr
    go ids e = case e of

      IdE id ->
        IdE (fromMaybe id (M.lookup (idName id) ids))

      Let bndr rhs body ->
        let
          ids' = M.insert (idName bndr) bndr' ids
          rhs' = go ids' rhs
          bndr' = bndr{ idUnfolding = Just rhs' }
        in
          Let bndr{ idUnfolding = Just rhs' } rhs' (go ids' body)

      ...

As before we tie the knot in let case and use it in Id case.

It’s also possible to initialize idUnfolding fields when parsing, using monadic knot-tying (MonadFix). Full code is shown at the end of this post, but the interesting bit is when parsing lets and Ids:

parseLet :: Parser Expr
parseLet = do
    _ <- string "let"
    id_name <- parseIdName
    _ <- char '='

    (id, rhs) <- mfix $ \ ~(id_, _rhs) -> do
      modify (Map.insert id_name id_)
      rhs <- parseExpr
      return (Id{ idName = id_name, idArity = 0, idUnfolding = Just rhs }, rhs)

    _ <- string "in"
    body <- parseExpr
    return (Let id rhs body)

parseId' :: Parser Id
parseId' = do
    name <- parseIdName
    id_map <- get
    let def = Id{ idName = name, idArity = 0, idUnfolding = Nothing }
    return (fromMaybe def (Map.lookup name id_map))

The idea is very similar. When parsing a let we add a thunk for the binder with correct unfolding to a map. The map is then used when parsing Ids in the RHS and body of the let.

An alternative

A well-known way of associating information with identifiers in a compiler is by using a “symbol table”. Instead of adding information about Ids directly in the Id fields, we maintain a table (or multiple tables) that map Ids to the relevant information. Here’s one way to do this in our language:

data Expr
  = IdE String
  ...

data IdInfo = IdInfo
  { idArity :: Int
    -- ^ Arity of a lambda. 0 for non-lambdas.
  , idUnfolding :: Maybe Expr
    -- ^ RHS of a binder, used for inlining
  }

type SymTbl = Map.Map String IdInfo

In this representation we have to refer to the table for idArity or idUnfolding. That’s slightly more work than the previous representation where we could simply use the fields of an Id, but a lot of other things become much simpler and efficient.

Here’s dropUnusedBindings in this representation (only the interesting bits, full code is at the end of this post):

dropUnusedBindings :: Expr -> State SymTbl Expr
dropUnusedBindings =
    fmap snd . go Set.empty
  where
    go :: Set.Set String -> Expr -> State SymTbl (Set.Set String, Expr)
    go free_vars e0 = case e0 of

      Let bndr e1 e2 -> do
        (free2, e2') <- go free_vars e2
        if Set.member bndr free2 then do
          (free1, e1') <- go free_vars e1
          setIdArity bndr (countLambdas e1')
          return (Set.delete bndr (Set.union free1 free2), Let bndr e1' e2')
        else
          return (free2, e2')

      ...

Our pass is now stateful (updates the symbol table) and written in monadic style. Knot-tying is gone. We update the symbol table after processing a let RHS. Because Ids no longer have the arity information we don’t need to update anything other than the symbol table.

It’s now trivial to implement addUnfoldings:

addUnfoldings :: Expr -> State SymTbl ()
addUnfoldings e0 = case e0 of

    IdE{} ->
      return ()

    IntE{} ->
      return ()

    Lam arg body ->
      addUnfoldings body

    App e1 e2 -> do
      addUnfoldings e1
      addUnfoldings e2

    IfE e1 e2 e3 -> do
      addUnfoldings e1
      addUnfoldings e2
      addUnfoldings e3

    Let bndr e1 e2 -> do
      addUnfoldings e1
      addUnfoldings e2
      setIdUnfolding bndr e1

Doing it during parsing is also trivial, and shown in the full code at the end of this post. Updating typing information when we no longer need them is simply

dropTypes :: State SymTbl ()
dropTypes = modify (Map.map (\id_info -> id_info{ idType = Nothing }))

We could also maintain a separate table for typing information, in which case all we had to do would be to stop using that table.

Easy!

Final remarks

Cyclic AST representation in a purely functional language necessitates knot-tying and relies on lazy evaluation. A well-known alternative is using symbol tables. It works across languages (does not rely on lazy evaluation) and keeps the code simple.

Cyclic representations make using the information easier, while symbol tables make updating easier. Code for updating the information is shown above and the previous post. For using the information, compare:

-- Get the information in a cyclic representation
... (idUnfolding id) ...

-- Get the information using a symbol table
arity <- getIdUnfolding id

To me the monadic version is not too bad in terms of verbosity or convenience, especially because Haskell makes state passing so easy.

Some of the problems with knot-tying is as explained at the end of the previous post. What I did not mention in the previous post is the problems with efficiency, which are demonstrated better in this post.

In the “typing information” example, with the cyclic representation I need to copy the entire AST to update every single Id occurrence and binder. With the symbol table I need to update just the table, which is much smaller than the AST.
In the unfolding example, with the cyclic representation I again need to copy the entire AST or use MonadFix if I’m doing it in parsing. With a symbol table the pass does not update the AST, only updates the table. If I’m doing it in parsing then I simply add an entry to the table after parsing a let. (full code at the end of this post)

In use sites, getIdArity (a map lookup) does more work than idArity (just follows a pointer). While I don’t have any benchmarks on this, I doubt that this is bad enough to make cyclic representation and knot-tying preferable.

Examples in these two posts are inspired by GHC:

GHC keeps information about Ids in an Id field with type IdInfo.
IdInfo type holds information like arity and unfolding.
For type information Id has another field: varType.
The process of throwing away information that are no longer needed is called “zapping”. It happens in many places in GHC, one example is the tidying pass (prepares code for interface file generation) that zaps unfoldings.
Knot-tying is used in many places in the compiler, here’s an example where we use knot-tying to update IdInfos with code generator-generated information.

In the first post I mostly argued that knot-tying makes things more complicated, and in this post I showed that knot-tying is necessary because of the cyclic representation. If we want to do the same without knot-tying we either have to introduce mutable references (e.g. IORefs) in our AST (not shown in this post), or have to use a non-cyclic representation with symbol tables.

Between these two representations, I think non-cyclic representation with symbol tables is a better choice.

Full code (knot-tying)

Knot-tying: why and how (and my opinions on it)

2020-02-21T00:00:00Z

Suppose I have this simple language:

data Expr
  = IdE Id
  | IntE Int
  | Lam Id Expr
  | App Expr Expr
  | IfE Expr Expr Expr
  | Let Id Expr Expr

When generating code, for an identifier that stands for a lambda, I want to know the arity of the lambda, so that I can generate more efficient code. While in this language a lambda takes only one argument, if I have something like

let f = \x . \y . \z . ...
 in ...

I consider f as having arity 3.

One way to implement this is having this information attached to every Id:

data Id = Id
  { idName :: String
    -- ^ Unique name of the identifier
  , idArity :: Int
    -- ^ Arity of a lambda. 0 for non-lambdas.
  }

This way of associating information to Ids makes some things very simple. For example, if I’m generating code for this application:

f 1 2

In AST:

App (App (IdE (Id { idName = "f", idArity = 3 })) (IntE 1)) (IntE 2)

I can simply use the idArity field to see the arity of the function being applied. It doesn’t get any simpler than this.

Problem 1: redundant allocations

In a program we usually have many references to a single Id, whether it’s for a top-level function or an argument. If we allocate an Id for every occurrence that’s a lot of redundant allocations that make the AST representation larger, and affects compiler performance.

For example, if I have this expression:

f x + f y

A naive representation of this would be

App
  (App
     (IdE Id { idName = "+" , idArity = 2 })
     (App
        (IdE Id { idName = "f" , idArity = 0 })
        (IdE Id { idName = "z" , idArity = 0 })))
  (App
     (IdE Id { idName = "f" , idArity = 0 })
     (IdE Id { idName = "t" , idArity = 0 }))

Here for every occurrence of f we have a new Id, and these Ids all have the same arity. This is two Id heap objects used for the same identifier.

A more efficient representation would be

let f = Id { idName = "f", idArity = 0 } in
App
  (App
     (IdE Id { idName = "+" , idArity = 2 })
     (App
        (IdE f)
        (IdE Id { idName = "z" , idArity = 0 })))
  (App
     (IdE f)
     (IdE Id { idName = "t" , idArity = 0 }))

Here we only have one heap object for f, and all uses refer to that one object.

This is actually not hard to fix: we maintain a map from Id names to the actual Ids. When we see a let we add the LHS to the map. When we see an identifier we lookup. Easy.

Problem 2: invalidating information during transformations

Suppose I want to implement a pass that drops unused bindings. For example:

let f = let a = e1
         in \x . e2
 in f z + f t

Here if e2 doesn’t use a I want to drop the binding:

let f = \x . e2
 in f z + f t

The AST for the original program is:

Let
  Id { idName = "f" , idArity = 0 }
  (Let
     Id { idName = "a" , idArity = 0 }
     <e1>
     (Lam Id { idName = "x" , idArity = 0 } <e2>))
  (App
     (App
        (IdE Id { idName = "+" , idArity = 2 })
        (App
           (IdE Id { idName = "f" , idArity = 0 })
           (IdE Id { idName = "z" , idArity = 0 })))
     (App
        (IdE Id { idName = "f" , idArity = 0 })
        (IdE Id { idName = "t" , idArity = 0 })))

Here’s a naive implementation of this pass:

dropUnusedBindings :: Expr -> Expr
dropUnusedBindings = snd . go Set.empty
  where
    go free_vars e0 = case e0 of

      IdE id ->
        (Set.insert (idName id) free_vars, e0)

      IntE{} ->
        (free_vars, e0)

      Lam arg body ->
        bimap (Set.delete (idName arg)) (Lam arg)
              (go free_vars body)

      App e1 e2 ->
        let
          (free1, e1') = go free_vars e1
          (free2, e2') = go free_vars e2
        in
          (Set.union free1 free2, App e1' e2')

      IfE e1 e2 ->
        let
          (free1, e1') = go free_vars e1
          (free2, e2') = go free_vars e2
          (free3, e3') = go free_vars e3
        in
          (Set.unions [free1, free2, free3], IfE e1' e2' e3')

      Let bndr e1 e2 ->
        let
          (free1, e1') = first (Set.delete (idName bndr)) (go free_vars e1)
          (free2, e2') = go free_vars e2
        in
          if Set.member (idName bndr) free2
            then (Set.delete (idName bndr) (Set.union free1 free2),
                  Let (updateIdArity bndr e1') e1' e2')
            else (free2, e2')

updateIdArity :: Id -> Expr -> Id
updateIdArity id rhs = id{ idArity = countLambdas rhs }

countLambdas :: Expr -> Int
countLambdas (Lam _ rhs) = 1 + countLambdas rhs
countLambdas _ = 0

The problem with this pass is that it changes arity of binders, but doesn’t update the idAritys of occurrences. Here’s what I get if I run this over the original AST:

Let
  Id { idName = "f" , idArity = 1 }
  (Lam Id { idName = "x" , idArity = 0 } <e2>)
  (App
     (App
        (IdE Id { idName = "+" , idArity = 2 })
        (App
           (IdE Id { idName = "f" , idArity = 0 })
           (IdE Id { idName = "z" , idArity = 0 })))
     (App
        (IdE Id { idName = "f" , idArity = 0 })
        (IdE Id { idName = "t" , idArity = 0 })))

Note how f, which was not a lambda binder previously, became a lambda binder with arity 1. The pass correctly updated f’s idArity in the binder position, but it did not update it in the occurrences! Indeed, in this representation it’s not easy to do this efficiently.

Even if we solved the first problem and had only one closure for f, the updateIdArity step in this pass allocates a new Id and loses sharing. So we would end up with something like:

let f = Id { idName = "f", idArity = 0 } in
Let
  Id { idName = "f" , idArity = 1 }
  (Lam Id { idName = "x" , idArity = 0 } <e2>)
  (App
     (App
        (IdE Id { idName = "+" , idArity = 2 })
        (App
           (IdE f)
           (IdE Id { idName = "z" , idArity = 0 })))
     (App
        (IdE f)
        (IdE Id { idName = "t" , idArity = 0 })))

The arity of f in the use sites are still wrong, and we lost sharing.

Knot-tying

Knot-tying is a way of solving both of these in one step. I find it quite hard to explain in words so I’ll show the code (only the interesting bits):

dropUnusedBindings :: Expr -> Expr
dropUnusedBindings =
    snd . go Map.empty Set.empty
  where
    go :: Map.Map String Id -> Set.Set String -> Expr -> (Set.Set String, Expr)
    go binders free_vars e0 = case e0 of

      IdE id ->
        (Set.insert (idName id) free_vars, IdE (fromMaybe id (Map.lookup (idName id) binders)))

      Let bndr@Id{ idName = bndr_name } e1 e2 ->
        let
          bndr' = updateIdArity bndr e1'
          binders' = Map.insert bndr_name bndr' binders
          (free1, e1') = first (Set.delete bndr_name) (go binders' free_vars e1)
          (free2, e2') = go binders' free_vars e2
        in
          if Set.member bndr_name free2
            then (Set.delete bndr_name (Set.union free1 free2),
                  Let bndr' e1' e2')
            else (free2, e2')

      ...

The differences from the original version:

We now pass around a “binders” map that maps identifier names to actual Ids. This is used to common-up uses of identifiers with one shared heap object with correct arity info.
In IdE case we now do lookup on this map, and replace the Id with the shared Id with correct arity info from the map.
The tricky bit is the Let case where we have a cyclic group of let bindings. binders' is the binder map with bndr with correct arity information. However to be able to generate that map we first need to process e1, and while processing e1 we want to replace any occurrences of bndr with correct Id too! This gives us the cyclic bindings:
```
bndr' = updateIdArity bndr e1'
binders' = Map.insert bndr_name bndr' binders
(..., e1') = ... (go binders' free_vars e1)
```

This technique relies heavily on lazy evaluation. In the original example the AST is not recursive, but suppose we also want to record RHSs of let binders in Ids, to be used for inlining:

data Id = Id
  { ...
  , idUnfolding :: Maybe Expr
    -- ^ RHS of a let binding, used for inlining
  }

Now once we implement sharing (solving problem 1) ASTs with recursive definitions will become cyclic. A simple example:

let fac = \x . if x then x * fac (x - 1) else 1 in fac 5

This will be represented as something like

pgm = Let fac_id rhs body
  where
    fac_id = Id { idName = "fac", idArity = 0, idUnfolding = Just rhs }
    rhs = Lam x_id (IfE (IdE x_id)
                        (App (App (IdE star_id) (IdE x_id))
                             (App (IdE fac_id) (App (App (IdE minus_id) (IdE x_id))
                                                    (IntE 1))))
                                  (IntE 1))
    body = App (IdE fac_id) (IntE 5)

    x_id = Id { idName = "x", idArity = 0, idUnfolding = Nothing }
    star_id = Id { idName = "*", idArity = 2, idUnfolding = Nothing }
    minus_id = Id { idName = "-", idArity = 2, idUnfolding = Nothing }

Here fac_id refers to rhs, which refers to fac_id, forming a cycle.

The knot-tying implementation of dropUnusedBindings works even in cases like this. We just need to update updateIdArity to update the unfolding, when it’s available:

updateIdArity :: Id -> Expr -> Id
updateIdArity id rhs =
    id{ idArity = countLambdas rhs
      , idUnfolding = idUnfolding id $> rhs }

This is a bit hard to try, but if I implement a Show instance for Id that doesn’t print the unfolding (to avoid looping), make fac_id’s arity 0, and call dropUnusedBindings this is the AST I get:

Let
  (Id "fac" 1)
  (Lam
     (Id "x" 0)
     (IfE
        (IdE (Id "x" 0))
        (App
           (App (IdE (Id "*" 2)) (IdE (Id "x" 0)))
           (App
              (IdE (Id "fac" 1))
              (App (App (IdE (Id "-" 2)) (IdE (Id "x" 0))) (IntE 1))))
        (IntE 1)))
  (App (IdE (Id "fac" 1)) (IntE 5))

All uses of fac have correct arity! Similarly I can do something hacky like this in GHCi to check that the unfolding has correct arity for uses of fac too:

ghci> let Let lhs _ _ = dropUnusedBindings pgm
ghci> putStrLn (ppShow (idUnfolding lhs))
Just
  (Lam
     (Id "x" 0)
     (IfE
        (IdE (Id "x" 0))
        (App
           (App (IdE (Id "*" 2)) (IdE (Id "x" 0)))
           (App
              (IdE (Id "fac" 1))
              (App (App (IdE (Id "-" 2)) (IdE (Id "x" 0))) (IntE 1))))
        (IntE 1)))

Nice!

… or is it?

The main problem with this technique is that it’s very difficult to understand. Even after working on different knot-tying code in GHC and implementing my own knot-tying passes, the recursive let bindings in the Let case above is still mind-boggling to me.

Secondly, it’s really hard to reason about the evaluation order of things in knot-tying code. You might think that this shouldn’t be an issue in a purely functional implementation, but in my experience any non-trivial compiler pass, even when implemented in a purely functional style, still needs debugging. Even if it’s not buggy, you may want to trace the evaluation and print a few things to understand how the code works.

Knot-tying code makes this, which should be absolutely trivial in any reasonable code base, very difficult. If you end up evaluating just the right places with your print statements you end looping. For example, here’s our AST with a few bang patterns:

data Expr
  = IdE !Id
  | IntE Int
  | Lam Id Expr
  | App !Expr !Expr
  | IfE Expr !Expr Expr
  | Let Id Expr Expr

data Id = Id
  { idName :: String
  , idArity :: !Int
  }

If you run the same program above using this AST definition you’ll see that the pass now loops. Note that I’ve removed the idUnfolding field just to demonstrate that this doesn’t happen because we have a loop in the AST.

It’s even more frustrating when what you’re debugging is a loop. You add a few prints, and scratch your head thinking why none of your prints are working even though the algorithm is clearly looping. What’s really happening is that the code is indeed looping, but for a different reason…

Finally, because making things more strict potentially breaks things, knot-tying makes fixing some memory leaks very hard. For example, we may have many passes on our AST, one of them being our knot-tying pass. Some of these passes may be very leaky, and instead of adding strict applications or bang patterns to dozens of places, we may want to add bangs to only a few places in the AST. But that, as demonstrated above, causes our knot-tying pass to loop.

Opinions

GHC makes use of knot-tying extensively, which has always been one of the pain points for me since my first days contributing to GHC. I vaguely remember, I was a graduate student at Indiana University at the time, making my first contributions to GHC. I remember finding it refreshing to be able to simply do idType and get type of an identifier in GHC, as opposed to using a symbol table, which I’d been doing in some of the other compilers I worked on in the past.

At the same time, I was constantly confused that my simple print statements added in some front-end pass makes the compiler loop. I had no idea what could be the reason. I had no idea that the thing I found so refreshing is also the reason why debugging and tracing were so much harder.

Suffice it to say, I don’t like knot-tying. If I had to use knot-tying in my project I’d probably reconsider how I represent my data instead. For example, if we simply used an unique number for our identifiers and maintained a symbol table to map the unique numbers to actual Ids then we wouldn’t have cycles for recursive functions in the AST and wouldn’t need knot-tying. Updating something about an Id would be a simple update in the symbol table.

Full code

Some arguments against small syntax extensions in GHC

2020-01-22T00:00:00Z

I recently realized that I haven’t published a single post in 2019. I think that’s the longest break I ever took to blogging, and it kinda made me motivated to publish some of the draft posts that I’ve been keeping in private Github gists.

This post is originally written in 11 January 2019. Because it is more of an angry rant than a constructive piece, I wasn’t sure at the time that publishing it is a good idea. However reading it again now, I see that it’s not directed at a person, a group, or a specific proposal/patch, so I think it shouldn’t be offensive to anyone and I should be able to publish it on my personal blog.

(original post starts below)

So I woke up at 5AM today and felt like writing about one of my frustrations. These are my personal opinions, and I don’t represent GHC HQ here.

At this point adding new syntax to GHC/Haskell is a bad idea. Before moving on to examples, here are some facts:

The language that GHC supports is incredibly complex. GHC 8.6.3 man page lists 115 language pragmas.
You just can’t have a good understanding of all of these features and know interactions of the proposed syntax with all combinations of these.
GHC is a complex and old compiler with parts that today no active contributor knows well. The compiler (ignoring all the libraries, the RTS, tools etc.) currently has 189,699 lines of code (ignoring comments and whitespace). That’s a lot of complexity to deal with.
When you propose a new syntax, what you’re actually proposing is:
- At least one more pragma
- More user manual sections
- MVP implementation of your syntax (which is usually not bug-free)
- A few common-case tests (which are usually not enough)
- More headache for tool developers
- Scaring more potential new Haskell developers away
- Adding to the frustration of existing Haskell developers
- Adding maintenance burden to GHC devs
Because you can’t predict all the interactions of your new syntax (conceptually, or in the implementation) your syntax will cause a ton of problems.
Those problems will sit there unfixed for months/years.
GHC maintainers barely have enough time and manpower to provide stable releases. 8.6.1 and 8.6.2 are completely broken (#15544, #15696, #15892), and 8.6.3 doesn’t work well on Windows.

You might not accept some of these, however in my experience these are facts. If you disagree with any of these let me know and I can elaborate.

I’ll have only two examples for now, because I don’t normally work on front-end parts of the compiler I don’t notice most of the problems.

Example 1: Tiny addition to GHCi syntax

#7253 proposed a tiny new syntax in GHCi. A few years later a new contributor picked it up and submitted a patch. This trivial new syntax later caused #11606, #12091, #15721. That’s 3 too many tickets for a trivial syntax that buys us so little. It also generated at least one SO question, and invalidated an answer to another SO question by making things more complicated.

The implementation is finally fixed by a frustrated maintainer, but the additional complexity (both in the implementation, and as the GHCi syntax to be explained to users) it added won’t be fixed.

Example 2: -XBlockArguments

This was proposed as a GHC proposal. It’s a trivial syntax change that in the best case can save 3 characters (including spaces). So far it generated two tickets: #16137, #16097. Even worse than the previous example is none of these tickets mention -XBlockArguments, they don’t even use it! Yet the error messages got significantly worse because of it.

Just to be clear

I think some of the extensions are quite useful. However I also think that at this point new syntax extensions are doing more harm than good. Problems from a maintainer’s point of view are as listed above (arguably maintainers’ problems are also users’ problems because they lead to poor product, but let’s ignore this aspect for now). Now I want to add one more problem, this time from a software developer/engineer’s point of view:

Adding a different way of doing things, especially when the difference is so small, does more harm than good.

Here’s why. Now that we have two ways of using do syntax:

-- (1)
atomically $ do
  ...

-- (2) with -XBlockArguments
atomically do
  ...

with my team I have to do one of these

Decide which one to use, and somehow manually make sure to use it consistently (this can’t be done automatically as we lack the tooling)
Let everyone use whatever they want.

(1) means wasting the team’s time and energy on endless bikeshedding. (2) means being inconsistent in the source code. Either way we lose.

You might argue that with good tooling (1) is not a problem, and I’d agree. However as we add new syntax the tooling story will only get worse. GHC Haskell syntax is already so complex we don’t even have a good formatter. We should first stop making it even more complex if we want the tooling story to get better.

What we need

In my opinion what we need is principles to guide the language and the compiler. Currently we don’t have this (last paragraph), and the result is 100+ pragmas, a buggy compiler, and frustrated users and maintainers.

My advice to users

If you’re proposing a new syntax; don’t! If you know someone who will, point them to this blog post.

osa1.net - All posts

My thoughts on OCaml

No standard and easy way of implementing interfaces

Bad standard library

Syntax problems

Rest of the package is also not that good

But at least it’s a functional language?

When should I use it?

Fast polymorphic record access

Row polymorphism and record subtyping, briefly

(0) Records as maps

(1) Passing accessors as parameters

Prerequisite: integers for labels

(2) Per-record label-to-field-offset tables

(2.1) Making the tables global

(2.2) Sharing label IDs and record shapes

(2.3) Flattening the table

(2.4) Removing the constant factor

(2.5) Compacting the table further

Conclusions

References

Products and sums, named and anonymous

Products

Sums

Names in type checking (nominal typing)

Structural type checking for products

Structural type checking for sum types

Names -> nominal, ??? -> structural

Conclusions

Conditional compilation based on crate type

8 years of Haskell

gdb breakpoints with conditions on backtrace

New blog post published on Well-Typed's blog

Knot-tying: two more examples, and an alternative

Example: attaching typing information to Ids

Example: attaching unfoldings to Ids

An alternative

Final remarks

Knot-tying: why and how (and my opinions on it)

Problem 1: redundant allocations

Problem 2: invalidating information during transformations

Knot-tying

… or is it?

Opinions

Some arguments against small syntax extensions in GHC

Example 1: Tiny addition to GHCi syntax

Example 2: -XBlockArguments

Just to be clear

What we need

My advice to users