osa1.net - All posts

Subtyping and subsumption

2024-10-21T00:00:00Z

Subtyping is a relation between two types. It often comes with a typing rule called “subsumption”, which says that if type B is a subtype of type A (usually shown as B <: A), then a value of type B can be assumed to have type A.

The crucial part is that subsumption is implicit, the programmer doesn’t explicitly cast the value with type B to type A.

When we make an operation implicit in a language, we need to make sure that it is (1) safe (2) performant. Users will be doing it without realizing, and we don’t want to accidentally break things or make them slow.

Let’s consider how we can make subsumption safe and performant.

Safety of subsumption

Different languages give different safety guarantees. High-level languages often guarantee:

Memory safety: a memory read or write shouldn’t cause undefined behavior.

Examples: out-of-bounds array accesses should be caught, dangling pointers shouldn’t be allowed or dereferencing them should be caught in runtime.
Type safety: static guarantees of the language’s type system should be uphold.

Example: if I have a function f : A -> B and a value x : A after subsumption, f(x) shouldn’t fail in compile time or runtime.

There could be different safeties that the language guarantees. Some of those safeties may also be checked in runtime instead of compile time.

Whatever safeties the language guarantees, they must be preserved with subsumption.

From a programmer’s perspective however, these are not enough to make sure that the program will work as before when subsumption is used. If I can pass a value of type B where A is expected, I need to make sure B, when used as A, acts like A.

This is called “behavioral subtyping” (or “substitutability”), and it depends on not the types of A’s operations but the observable behaviors of A and its subtypes.

I don’t have a good real-world example of this, but you can imagine two types with the same public APIs that work differently. Since the public APIs are the same one can be made subtype of the other and (1) and (2) would still be satisfied, but doing that would cause bugs when one is accidentally passed as the other.

Performance of subsumption

Definition of “fast” or “performant” also depends on the language. A C++ programmer’s fast and Python programmer’s fast are often not the same.

However in general, heap allocation should be avoided.

Object-oriented languages (as defined in my previous post) without multiple inheritance can often implement subsumption of reference values as no-op, i.e. values of type B work as A in runtime without any changes or copying.

Multiple inheritance makes things more complicated, but a reference to an object can still be converted to a reference of one of its supertypes by just adjusting the pointer value.

With unboxed/value types, conceptually, the value needs to be copied as its supertype, but that operation is often no-op. Consider an unboxed record (x: Int, y: Int, z: Int) that we store in a variable a. In runtime, a actually holds multiple stack locations or registers. When we copy it as let b: (x: Int, y: Int) = a, we don’t have to allocate new stack locations for b.x and b.y, we just map those locations to the same locations as a.x and a.y. When we pass b to a function, we pass a.x and a.y.

Where copying becomes a requirement and prohibitive is when you have something like ReadOnlyList<(x: Int, y: Int, z: Int)> and want to upcast it to ReadOnlyList<(x: Int, y: Int)> (the records are unboxed). From the safety perspective this operation is fine, but you have to allocate a new list and copy all the values.

I think this is rarely a problem in practice though, because most generic types, like List, end up being invariant in T anyway, because their API often uses T in both covariant and contravariant positions. So List<(x: Int, y: Int)> is not a supertype of List<(x: Int, y: Int, z: Int)> and subsumption does not apply.

No conclusions this time

In this short post I just wanted to give some definitions that I’m hoping to refer to in future posts.

OOP is not that bad, actually

2024-10-09T00:00:00Z

OOP is certainly not my favorite paradigm, but I think mainstream statically-typed OOP does a few things right that are very important for programming with many people, over long periods of time.

In this post I want to explain what I think is the most important one of these things that the mainstream statically-typed OOP languages do well.

I will then compare the OOP code with Haskell, to try to make the point that OOP is not as bad in everything as some functional programmers seem to think.

What even is OOP?

In this post I use the word “OOP” to mean programming in statically-typed language with:

Classes, that combine state and methods that can modify the state.
Inheritance, which allows classes to reuse state and methods of other classes.
Subtyping, where if a type B implements the public interface of type A, values of type B can be passed as A.
Virtual calls, where receiver class of a method call is not determined by the static type of the receiver but its runtime type.

Examples of OO languages according to this definition: C++, Java, C#, Dart.

An example of what this allows

This set of features allows a simple and convenient way of developing composable libraries, and extending the libraries with new functionality in a backwards compatible way.

It’s probably best explained with an example. Suppose we have a simple logger library:

class Logger {
  // Private constructor: initializes state, returns an instance of `Logger`.
  Logger._();

  // Public factory: can return `Logger` or any of the subtypes.
  factory Logger() => Logger._();

  void log(String message, Severity severity) { /* ... */ }
}

enum Severity {
  Info,
  Error,
  Fatal,
}

and another library that does some database stuff:

class DatabaseHandle {
  /* ... */
}

and an application that uses both:

class MyApp {
  final Logger _logger;
  final DatabaseHandle _dbHandle;

  MyApp()
      : _logger = Logger(),
        _dbHandle = DatabaseHandle(...);
}

As is usually the case, things that make network connections, change shared state etc. need to be mocked, faked, or stubbed to be able to test applications. We may also want to extend the libraries with new functionality. With the features that we have, we don’t have to see this coming and prepare the types based on this.

In the first iteration we might just add a concrete class that is just the copy of the current class, and make the current class abstract:

// The class is now abstract.
abstract class Logger {
  // Public factory now returns an instance of a concrete subtype.
  factory Logger() => _SimpleLogger();

  Logger._();

  // `log` is now abstract.
  void log(String message, Severity severity);
}

class _SimpleLogger extends Logger {
  factory _SimpleLogger() => _SimpleLogger._();

  _SimpleLogger._() : super._() {/* ... */}

  @override
  void log(String message, Severity severity) {/* ... */}
}

This change is backwards compatible, requires no changes in user code.

Now we might add more implementations, e.g. for ignoring log messages:

abstract class Logger {
  factory Logger() => _SimpleLogger();

  // New.
  factory Logger.ignoring() => _IgnoringLogger();

  Logger._();

  void log(String message, Severity severity);
}

class _IgnoringLogger extends Logger {
  factory _IgnoringLogger() => _IgnoringLogger._();

  _IgnoringLogger._() : super._() {}

  @override
  void log(String message, Severity severity) {}
}

Similarly we can add a logger that logs to a file, to a DB, etc.

We can do the same for the database handle class, but for mocking, faking, or stubbing, in tests.

To be able to use these new subtypes in our app, we implement a factory, or add a constructor to allow passing a logger and a db handle:

class MyApp {
  final Logger _logger;
  final DatabaseHandle _dbHandle;

  MyApp()
      : _logger = Logger(),
        _dbHandle = DatabaseHandle();

  MyApp.withLoggerAndDb(this._logger, this._dbHandle);
}

Note that we did not have to change any types, or add type parameters. Any methods of MyApp that use the _logger and _dbHandle fields do not have to know about the changes.

Now suppose one of the DatabaseHandle implementations also start using the logger library:

abstract class DatabaseHandle {
  factory DatabaseHandle.withLogger(Logger logger) =>
      _LoggingDatabaseHandle._(logger);

  factory DatabaseHandle() => _LoggingDatabaseHandle._(Logger.ignoring());

  DatabaseHandle._();

  /* ... */
}

class _LoggingDatabaseHandle extends DatabaseHandle {
  final Logger _logger;

  _LoggingDatabaseHandle._(this._logger) : super._();

  /* ... */
}

In our app, we might test by disabling logging in the db library, but start logging db operations in production:

class MyApp {
  // New
  MyApp.testingSetup()
      : _logger = Logger(),
        _dbHandle = DatabaseHandle.withLogger(Logger.ignoring());

  // Updated to start using the logging feature of the DB library.
  MyApp()
      : _logger = Logger(),
        _dbHandle = DatabaseHandle.withLogger(Logger.toFile(...));

  /* ... */
}

As an example that adds more state to the types, we can add a logger implementation that only logs messages above certain severity:

class _LogAboveSeverity extends _SimpleLogger {
  // Only logs messages with this severity or more severe.
  final Severity _severity;

  _LogAboveSeverity(this._severity) : super._();

  @override
  void log(String message, Severity severity) { /* ... */ }
}

We can add another factory to the Logger abstract class that returns this type, or we can even implement this in another library:

// Implemented in another library, not in `Logger`'s library.
class LogAboveSeverity implements Logger {
  // Only logs messages with this severity or more severe.
  final Severity _severity;

  final Logger _logger;

  LogAboveSeverity(this._severity) : _logger = Logger();

  LogAboveSeverity.withLogger(this._severity, this._logger);

  @override
  void log(String message, Severity severity) { /* ... */ }
}

As a final example to demonstrate adding more operations (rather than more state), we can have a logger that logs to a file, with a flush operation:

class FileLogger implements Logger {
  final File _file;

  FileLogger(this._file);

  @override
  void log(String message, Severity severity) {/* ... */}

  void flush() {/* ... */}
}

In summary:

We started with a simple logging and database library and wrote an app.
We added more capabilities to the logging and database libraries for testing and also in production use. In particular, we added:
- New functionality to the logger library, to disable logging, or logging to a file.
- A new dependency to the database library for logging database operations. We also allowed the users to override the default logger used.

Crucially, we didn’t have to change any types while doing these changes, and the new code is still as type safe as before.

The logger and database libraries evolved in a completely backwards compatible way.

Since none of the types used in our application changed, MyApp methods didn’t have to change at all.

When we decide to take advantage of the new functionality, we updated only how we construct the logger and db handle instances in our app. Rest of the app didn’t change.

Now let’s consider how something like this could be done in Haskell.

Attempting it in Haskell

Immediately at the start, we have a few choices on how to represent it.

Option 1: An ADT, with callback fields to be able to add different types of loggers later:

data Logger = MkLogger
    { _log :: Message -> Severity -> IO ()
    }

simpleLogger :: IO Logger

data Severity = Info | Error | Fatal
    deriving (Eq, Ord)

log :: Logger -> String -> Severity -> IO ()

In this representation, extra state like the minimum severity level in our _LogAboveSeverity is not added to the type, but captured by the closures:

logAboveSeverity :: Severity -> IO Logger
logAboveSeverity minSeverity = MkLogger
    { _log = \message severity -> if severity >= minSeverity then ... else pure ()
    }

If we need to update some of the state shared by the closures, the state needs to be stored in some kind of reference type like IORef.

Similar to the OOP code, the FileLogger needs to be a separate type:

data FileLogger = MkFileLogger
  { _logger :: Logger   -- callbacks capture the file descriptor/buffer and write to it
  , _flush  :: IO ()    -- similarly captures the file descriptor/buffer, flushes it
  }

logFileLogger :: FileLogger -> String -> Severity -> IO ()
logFileLogger = log . _logger

However, unlike our OOP example, existing code that uses the Logger type and log function cannot work with this new type. There needs to be some refactoring, and how the user code will need to be refactored depends on how we want to expose this new type to the users.

Option 2: A typeclass that we can implement for our concrete logger types:

class Logger a where
    log :: a -> String -> Severity -> IO ()

data SimpleLogger = MkSimpleLogger { ... }

simpleLogger :: IO SimpleLogger
simpleLogger = ...

instance Logger SimpleLogger where
  log = ...

To allow backwards-compatible changes in the logger library, we need to hide the concrete logger class:

module Logger
    ( Logger
    , simpleLogger -- I can export this without exporting its return type
    ) where

...

With this module, we have to either add a type parameter to the functions and other types that use Logger, or use existentials.

Adding a type parameter is not a backwards compatible change, and in general it can cause snowball effect of propagating the type parameter to the direct users, and then their users, and so on, creating a massive change and difficult to use types.

The problem with existentials is that they are limited in how you can use them, and are somewhat strange in some areas. In our application we can do this:

data MyApp = forall a . Logger a => MkMyApp
  { _logger :: a
  }

But we can’t have a local variable with this existential type:

createMyApp :: IO MyApp
createMyApp = do
  -- I can't add a type annotation to myLogger without the concrete type
  myLogger <- simpleLogger      -- simpleLogger :: IO SimpleLogger
  return MkMyApp { _logger = myLogger }

I also cannot have an existential type in a function argument:

-- The type signature is accepted by the compiler, but the value cannot be used.
doStuffWithLogging :: (forall a . Logger a => a) -> IO ()
doStuffWithLogging logger = log logger "test" Info -- some obscure type error

Instead we have to “pack” the logger value with its typeclass dictionary in a new type:

data LoggerBox = forall a . Logger a => LoggerBox a

doStuffWithLogging :: LoggerBox -> IO ()
doStuffWithLogging (LoggerBox logger) = log logger "test" Info

Other problems and limitations of this approach:

The syntax is just awful to the point where it’s deterrent: forall a . Logger a => ... a ... instead of just Logger.
It allows implementing FileLogger, but
- All subtypes need to be a new typeclass + an implementation (in OOP: just one class).
- This cannot be used for safe downcasting of a Logger value to FileLogger, without knowing the concrete type of the FileLogger.

Effect monad approach

The effect monad approach is a variation of option (2) without existentials. Instead of

class Logger a where
    log :: a -> String -> Severity -> IO ()

We add the ability to log in a monad type parameter:

class MonadLogger m where
    log :: String -> Severity -> m ()

Then provide a “monad transformer” for each of the logger implementations:

newtype SimpleLoggerT m a = SimpleLoggerT { runSimpleLoggerT :: m a }

instance MonadIO m => MonadLogger (SimpleLoggerT m) where
  log msg sev = SimpleLoggerT { runSimpleLoggerT = liftIO (logStdout msg sev) }

newtype FileLoggerT m a = FileLoggerT { runFileLoggerT :: Handle -> m a }

instance MonadIO m => MonadLogger (FileLoggerT m) where
  log msg sev = FileLoggerT { runFileLoggerT = \handle -> liftIO (logFile handle msg sev) }

The database library does the same, and the app combines these together:

newtype MyAppMonad a = ...

instance MonadLogger MyAppMonad where ...

instance MonadDb MyAppMonad where ...

Because we have one type parameter that encapsulates all side effects (instead of one for logging, one for database operations), this avoids the issues with snowballed type parameters in the use sites.

The database library can also add a logger dependency without breaking the user code.

I think this is the best we can get in Haskell, and it’s quite similar to our OOP solution in terms of code changes needed to be done in the user code.

However for this to work the entire ecosystem of libraries need to do things this way. If the database library decides to use the ADT approach, we will need an “adapter”, e.g. a monad typeclass for the DB operations, with a concrete monad transformer type to call the DB library functions.

This is also the main problem with the composable effects libraries.

(There are also issues with how this kind of code performs in runtime, but that’s probably a topic for another blog post.)

Composable effects

Haskellers have been developing various ways of modelling side effects (such as DB operations, logging) as “effects” and various ways of composing them.

A simple and widespread way of doing this is via the effect monads, as we’ve seen in the previous section.

However these systems have a few drawbacks, compared to our OOP solution:

Different effect libraries generally don’t work together. For example, mtl and eff functions won’t work together without some kind of adapter turning one into the other.
Even if the entire Haskell ecosystem decides to use one particular effect system, things like using two different handlers for different parts of the program, such as the example of using different logger in the db library and the main app, requires type juggling. In some effect libraries this is not even possible.
Finally, note that the OOP code shown in this post are very basic and straightforward code that even a beginner in OOP can write. Any new person who joins the project, or any one time contributor who just wants to fix a bug and move on, will be able to work on either one of the libraries or the application code. It’s difficult to say the same with the composable effects libraries in Haskell.

Conclusions

Mainstream statically-typed OOP allows straightforward backwards compatible evolution of types, while keeping them easy to compose. I consider this to be one of the killer features of mainstream statically-typed OOP, and I believe it is an essential feature for programming with many people, over long periods of time.

Just like OOP, Haskell has design patterns, such as the effect monad pattern we’ve shown above. Some of these design patterns solve the problem nicely, but they need an entire ecosystem to follow the same pattern to be useful.

I think it would be beneficial for the functional programming community to stop dismissing OOP’s successes in the industry as an accident of history and try to understand what OOP does well.

Thanks to Chris Penner and Matthías Páll Gissurarson for reviewing a draft of this blog post.

My thoughts on OCaml

2023-04-24T00:00:00Z

Since 2013 I’ve had the chance to use OCaml a few times in different jobs, and I got frustrated and disappointed every time I had to use it. I just don’t enjoy writing OCaml.

In this post I want to summarize some of the reasons why I don’t like OCaml and why I wouldn’t choose it for a new project today.

No standard and easy way of implementing interfaces

To me it’s absolutely essential that the language should have some way of defining interfaces, implementing those interfaces for the types, and programming against those interfaces.

In Haskell, this is done with typeclasses. Rust has a similar mechanism called traits. In languages with classes this is often done with abstract classes and “implementing” those classes in new classes (e.g. implements in Dart).

In OCaml there’s no way to do this. I have to explicitly pass functions along with my values, maybe in a product type, or with a functor, or as an argument.

Regardless of how I work around this limitation, it’s extremely inconvenient. Things that must be trivial in any code base, such as converting a value to a string for debugging purposes, become a chore, and sometimes even impossible.

As far as I know, there was at least one attempt at ameliorating this with modular implicits (implicit parameter passing), but I don’t know what happened to it since 2017. It looks like it’s still not a part of the language and the standard library is not using it.

Bad standard library

OCaml’s standard library is just bizarre. It has lots of small issues, and a few larger ones. It’s really just extremely painful to use.

Some examples of the issues:

Zoo of printing/debugging and conversion functions such as string_of_int, string_of_float, print_char, Int64.of_int, string_of_int, …
Overly polymorphic operators with type 'a -> 'a -> bool such as = (called “structural equality”, throws an exception if you pass a function) and >. Code that uses these operators will probably not work on user-defined types as expected.
Standard types are sometimes persistent, sometimes mutable. List, Map, and Set are persistent. Stack and Hashtbl are mutable.
Inconsistent naming:
- Length function for Map is cardinal, length function for Hashtbl is length.
- The “bytes” type is Bytes.t, the big int type is Big_int.big_int (instead of Big_int.t). The functions in these modules are also inconsistently named. Big_int functions are suffixed with _big_int, Bytes module functions are not prefixed or suffixed.
The regex module uses global state: string_match runs a regex and sets some global state. matched_string returns the last matched string using the global state.
Lack of widely used operations such as popcount for integer types, unicode character operations.
It doesn’t have proper string and character types: String is a byte array, char is a byte.

The bad state of OCaml’s standard library also causes fragmentation in the ecosystem with two competing alternatives: Core and Batteries.

Syntax problems

OCaml doesn’t have a single-line comment syntax.

The expression syntax has just too many issues. It’s inconsistent in how it uses delimiters. for and while end with end, but let, if, match, and try don’t, even though the right-most non-terminal is the same in all of these productions:

expr ::= ...
      | while  do  done
      | for  =  ( to | downto )  do  done
      | let  in 
      | if  then  [ else  ]
      | match  with (|  [ when  ] -> )+
      | try  with (|  [ when  ] -> )+
      ...

It has for and while, but no break and continue. So you use exceptions with a try inside the loop for continue, and outside for break.

It also has lots of ambiguities, and some of these ambiguities are resolved in an unintuitive way. In addition to making OCaml difficult to parse correctly, this can actually cause incorrect reading of the code.

Most common example is probably nesting match and try expressions:

match e0 with
| p1 -> try e1 with p2 -> e2
| p3 -> e3

Here p3 -> e3 is a part of the try expression.

Another example is the sequencing syntax ; and productions with as the right-most symbol:

let test1 b =
  if b then
    print_string "1"
  else
    print_string "2"; print_string "3"

Here print_string "3" is not a part of the if expression, so this function always prints “3”.

However, even though match also has as the right-most symbol, it has different precedence in comparison to semicolon:

let test2 b =
  match b with
  | true -> print_string "1"
  | false -> print_string "2"; print_string "3"

Here print_string "3" is a part of the false -> ... branch.

Try to guess how these functions are parsed:

(* Is the last print part of `else` or not? *)
let test3 b =
  if b then
    print_string "1"
  else
    let x = "2" in
    print_string x;
    print_string "3"

(* Is this well-typed? *)
let test4 b =
  if b then
    1, 2
  else
    3, 4

(* Is the type of this `(int * int) array -> unit` or `int array -> unit * int`? *)
let test5 a = a.(0) <- 1, 2

(* What if I replace `,` with `;`? Does this set the element 1 or 2? *)
let test6 a = a.(0) <- 1; 2

When writing OCaml you have to keep these rules in mind.

It also has the “dangling else” problem:

(* Is `else` part of the inner `if` or the outer? *)
if e1 then if e2 then e3 else e4

Finally, and I think this is probably the most strange thing about OCaml’s syntax and I’m not even sure what’s exactly happening here (I can’t find anything relevant in the language documentation), comments in OCaml are somehow tokenized and those tokens need to be terminated. They can be terminated inside another comment, or even outside. This is a bit difficult to explain but here’s a simple example:

(* " *)
print_string "hi"

OCaml 5.0.0 rejects this program with this error:

File "./test.ml", line 2, characters 16-17:
2 | print_string "hi"
                    ^
  String literal begins here

From the error message it seems like the " in the comment line actually starts a string literal, which is terminated in the first quote of "hi". The closing double quote of "hi" thus starts another string literal, which is not terminated.

However that doesn’t explain why this works:

(* " *)
print_string "hi"
(* " *)
print_string "bye"

If my explanation of the previous version were correct this would fail with an unbound hi variable, but it works and prints “bye”!

Rest of the package is also not that good

I’m not following developments in OCaml ecosystem too closely, but just two years ago it was common to use Makefiles to build OCaml projects. The language server barely worked on a project with less than 50 kloc. There was no standard way of doing compile-time metaprogramming and some projects even used the C preprocessor (cpp).

Some of these things probably improved in the meantime, but the overall package is still not good enough compared to the alternatives.

But at least it’s a functional language?

Almost all modern statically typed languages have closures, higher-order functions/methods, lazy streams, and combinators that run efficiently. Persistent/immutable data structures can be implemented even in C.

Also, OCaml has no tracking of side-effects (like in Haskell), and the language and the standard library have lots of features and functions with mutation, such as the array update syntax, mutable record fields, Hashtbl, and the regex module.

The only thing that makes OCaml more “functional” than e.g. Dart, Java, or Rust is that it supports tail calls. While having tail calls is important for functional programming, I would happily give up on tail calls if that means not having the problems listed above.

Also keep in mind that when you mix imperative and functional styles tail calls become less important. For example, I don’t have to implement a stream map function in Dart with a tail call to map the rest of the stream, I can just use a while or for loop.

When should I use it?

In my opinion there is no reason to use OCaml in a new project in 2023. If you have a reason to think that OCaml is the best choice for a new project please let me know your use case, I’m genuinely curious.

Fast polymorphic record access

2023-01-23T00:00:00Z

I like anonymous records and row polymorphism, but until recently I didn’t know how to generate efficient code for polymorphic record access. In this blog post I will summarize the different compilations of polymorphic record accesses that I’m aware of.

All of the ideas shown in this post can be used to access a record field when the record’s concrete type is not known, but the type system guarantees that it has the accessed field. This includes row polymorphism and record subtyping.

Most of the ideas also work when the record’s type is completely unknown and it may not have the accessed field, but some of the optimizations assume accesses cannot fail. Those optimizations can only be used on statically-typed but polymorphic records.

In some of the examples below I will use row polymorphism.

Row polymorphism and record subtyping, briefly

In this blog post we are interested in a specific application of row polymorphism to records. In short, row polymorphism allows type variables denoting sets of record fields, with their types. For example:

f : ∀ r . { x : Int, y : Int | r } -> Int
f a = a.x + a.y

Here the type variable r ranges over set of rows (or records). This function accepts any record as argument as long as the record has at least x : Int and y : Int fields.

The main difference between row polymorphism and record subtyping is that the type variable r can be used in the right-hand side of an arrow as well, allowing passing the record around without losing its concrete type. For example:

mapAB : ∀ r . { a : Int, b : Int | r } -> (Int -> Int) -> { a : Int, b : Int | r }
mapAB r f = { a = f r.a, b = f r.b, .. r }

This function takes any record that has a : Int and b : Int fields, and returns a new record with updated a and b fields and the rest of the fields. If I pass it a record with type { a : Int, b : Int, name : String } I get the same type back.

With subtyping, type of this function would look like:

mapAB : { a : Int, b : Int } -> (Int -> Int) -> { a : Int, b : Int }

In this version the return type just has a and b fields. Rest of the fields are lost. If I pass this a { a : Int, b : Int, name : String } I get { a : Int, b : Int } back. The name field is lost.

Without subtyping, when the record type in a field access expression is known, it’s easy to generate efficient code: we use the same offsets used when compiling a record literal with the type.

With subtyping, and with row-polymorphism when the record type is not a concrete record type but is a record type with a row variable, type of r in r.a does not immediately give us where in the record’s payload the field a is.

Let’s look at how we might go about implementing record field access in these cases.

(0) Records as maps

I don’t think this idea is used in statically-typed languages, but I wanted to include it for completeness.

We can implement records as maps with string keys. Field access then becomes a map lookup.

This is easy to implement because our language probably already has a map implementation in the standard library.

The disadvantages are:

Depending on the map implementation, every field access require a O(N) or O(log(N)) map lookup.
Map entries will be stored in a separate memory location (instead of in the record object’s payload), which will require pointer chasing to read the field value.
Unnecessary memory overhead caused by map fields that are not really necessary for records: such as the capacity and size fields.

With whole-program compilation, we can improve the constant factors a bit by mapping labels (field names) in the program to unique integers. This way lookups don’t require string hashing or comparison, but this is still slow and memory-inefficient compared to other techniques we will discuss below.

(1) Passing accessors as parameters

If you’re familiar with Haskell, this is the Haskell way of implementing row polymorphic records.

The idea is that when we pass a record to a row-polymorphic function, we also pass, implicitly, and as functions, the accessors that the function needs.

In Haskell, type of mapAB we’ve seen above would look like this:

mapAB : ∀ r . (HasField r 'A Int, HasField r 'B Int) => Record r -> (Int -> Int) -> Record r

The runtime values for HasField ... constraints are the accessors. When calling this function we don’t explicitly pass these accessors, the compiler generates them. In a well-typed program, we either have these values in the call site, or we know how to generate them (e.g. the record type is concrete in the call site), so it’s possible for the compiler to generate and pass these arguments.

The main advantage of this approach is that it doesn’t require any language support specifically for records.

The main disadvantages are:

Every field access is a function call.
Parameter passing per field per record does not scale well and causes messy and slow generated code. For example, suppose we want to take two records with fields x : Int and y : Int:
```
f : ∀ r . (HasField r 'X Int, HasField r 'Y Int) => Record r -> Record r -> ...
```
This function takes two implicit arguments, but it has a limitation that the record arguments need to have the same record types. I can’t call this function with two different records:
```
f { x = 123, y = 456, a = "hi" } { x = 0, y = -1, b = false }
```
For this to work I need two row variables:
```
f : ∀ r1 r2 .
    (HasField r1 'X Int, HasField r1 'Y Int,
     HasField r2 'X Int, HasField r2 'Y Int) =>
    Record r1 -> Record r2 -> ...
```
This version works, but it also takes 4 implicit arguments.

Prerequisite: integers for labels

Starting with the next approach, we will require mapping labels (field names) to integers in compile-time, to be used as indices.

Because these integers for labels will be used in record allocation and field accesses, it is possible that a label we see later in a program will cause different code generation for a record field access that we’ve already seen.

We have two options:

We can avoid this problem with a whole-program pass to collect all labels in the program.

This is trivial with a whole-program compiler as a front-end pass can store all labels seen in a component (library, module) somewhere and we can map those labels to integers before code generation.
We can have a link-time step to update record allocation and field access code with the integers for the labels.

In the rest of the post, labels will always get integers based on their lexicographical order and we will call these integers for labels just “labels”.

For example, if I have labels a, c, b, d in my program, their numbers will be 1, 3, 2, 4, respectively.

(2) Per-record label-to-field-offset tables

With integers as labels we can add a table to every record (records with the same set of keys sharing the same table) mapping labels in the program to offsets in the record’s payload. For example, the table for a record with fields a and c when the program has labels a, b, c, d, looks like this:

[ 0, _, 1, _ ]

This table is indexed by the label and the value gives the offset in the record’s payload for the field. _ means the record does not have the field. In a well-typed program we won’t ever see a _ value being read from a table.

This approach is quite wasteful as every table will have as many entries as number of labels in the program, but we will compress these tables below to reasonable sizes.

We will call these tables “record offset tables” or “offset tables” in short. When compiling a record access we need to get the record’s offset table. For this we add an extra word (pointer) to record objects pointing to their offset tables. We then generate this code for a record field access:

record[record[OFFSET_TABLE_INDEX][label]]

OFFSET_TABLE_INDEX is the constant for where the offset table pointer is in record objects.

Offset tables are generated per record shape (set of labels), so the total number of tables shouldn’t be too large.

Since the _ entries won’t ever be used, we can shrink the tables with trailing _ entries. In our example above with a record with a and c fields, the last _ entry can be omitted:

[ 0, _, 1 ]

(2.1) Making the tables global

Because offset tables are per-shape, and the total number of record shapes in a program should be small, if we allocate a few bits in record object headers for the “shape index” of the record, this index can be used to index a global table mapping record shapes to their offset tables.

Generated code for record access expressions will look like:

record[RECORD_OFFSET_TABLES[getRecordShapeId(record)][label]]

getRecordShapeId will read the bits in the object header for the record shape id. Depending on the actual header layout, it will look something like:

int getRecordShapeId(Object* object) {
  return (object->header & RECORD_ID_MASK) >> HEADER_BITS;
}

With record shape IDs in headers and a global table mapping shape IDs to offset tables, we no longer need an extra word in record objects for the offset table pointer.

Here’s an example of offset tables when we have labels a, b, x, y, and two records 0: {a, b} and 1: {x, y}:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
  _, // label x
  _, // label y
];

RECORD_1_OFFSET_TABLE = [
  _, // label a
  _, // label b
  0, // label x
  1, // label y
];

RECORD_OFFSET_TABLES = [
  RECORD_0_OFFSET_TABLE, // record 0
  RECORD_1_OFFSET_TABLE, // record 1
];

As before, the offset table for record 0 can be shrunk as:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
];

Labels that are not used in the same record program can be given the same ID.

In the example above, this allows us to have a single table for both records:

RECORD_0_1_OFFSET_TABLE = [
  0, // label a or x
  1, // label b or y
];

RECORD_OFFSET_TABLES = [
  RECORD_0_1_OFFSET_TABLE, // record 0
  RECORD_0_1_OFFSET_TABLE, // record 1
];

The problem of assigning IDs to labels is very similar to stack allocation when spilling during register allocation. We have practically infinite amount of IDs (stack space), but we want to reuse the same ID for labels as long as they’re never used in the same record (live at the same time).

After sharing label IDs, some of the shapes may be identical, as in our example. We can give those shapes the same ID and avoid redundant entries in the offset tables.

With this, our example with two records {a, b} and {x, y} compiles to just one offset table:

RECORD_0_1_OFFSET_TABLE = [
  0, // label a or x
  1, // label b or y
];

RECORD_OFFSET_TABLES = [
  RECORD_0_1_OFFSET_TABLE, // record 0 and 1
];

(2.3) Flattening the table

Suppose we have these record shapes in a program:

{a, b, q}
{x, y, q}

The RECORD_OFFSET_TABLES table is currently an array of pointers, and indexing the offset table still requires pointer chasing.

To avoid pointer chasing we can flatten the table.

For our current program, the tables, without flattening, look like this:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
  _, // label x
  _, // label y
  2, // label q
];

RECORD_1_OFFSET_TABLE = [
  _, // label a
  _, // label b
  0, // label x
  1, // label y
  2, // label q
];

RECORD_OFFSET_TABLES = [
  RECORD_0_OFFSET_TABLE,
  RECORD_1_OFFSET_TABLE,
];

We can flatten this as:

RECORD_0_OFFSET_TABLE = [
  0, // label a
  1, // label b
  _, // label x
  _, // label y
  2, // label q
];

RECORD_1_OFFSET_TABLE = [
  _, // label a
  _, // label b
  0, // label x
  1, // label y
  2, // label q
];

RECORD_LABEL_OFFSETS = [
  0, // record 0, label a
  1, // record 0, label b
  _, // record 0, label x
  _, // record 0, label y
  2, // record 0, label z

  _, // record 1, label a
  _, // record 1, label b
  0, // record 1, label x
  1, // record 1, label y
  2, // record 1, label z
];

Field indexing then becomes:

record[RECORD_LABEL_OFFSETS[(getRecordShapeId(record) * NUM_LABELS) + label]]

With this version we eliminate one layer of indirection.

(2.4) Removing the constant factor

The idea here is not too important on its own, but it will enable further improvements.

The NUM_LABELS factor in field access code above can be eliminated by incrementing record shape IDs by NUM_LABELS instead of 1. In our example, instead of having record IDs 0 and 1, we will have 0 and 5 (incremented by the number of labels in the program).

Since there may be large number of labels in a program and we may have only a few bits to store the record IDs, an alternative would be to convert the table to label-major order like this:

RECORD_LABEL_OFFSETS = [
  0, // label a, record 0
  _, // label a, record 1

  1, // label b, record 0
  _, // label b, record 1

  _, // label x, record 0
  1, // label x, record 1

  _, // label y, record 0
  2, // label y, record 1

  3, // label z, record 0
  3, // label z, record 1
];

With this table, indexing code becomes:

record[RECORD_LABEL_OFFSETS[(label * NUM_RECORDS) + getRecordShapeId(record)]]

We can then eliminate the NUM_RECORDS factor the same way, by incrementing label IDs by NUM_RECORDS instead of 1, and index with:

record[RECORD_LABEL_OFFSETS[label + getRecordShapeId(record)]]

(2.5) Compacting the table further

Now that the table index of a label is label + shape_id and we have a single table, we can shift the entries in the table by decrementing label IDs.

For this it doesn’t matter whether we store in label-major or record-major order. Which one of these will generate a smaller table will probably depend on the program. As an example, suppose we store the table in label-major order, and we have these records in the program:

0: {x, y, z, t}
1: {x, y}
2: {z, t}

The table will look like:

[ 0, 0, _,   // label x
  1, 1, _,   // label y
  2, _, 0,   // label z
  3, _, 1 ]  // label t

Record IDs will be 0, 1, 2, and label IDs will be 0, 3, 6, 9.

We can use the unused slot for label x, record 2, by decrementing the label index for y by one. If we then do the same for z, the label IDs become 0, 2, 4, 7, and the table becomes:

[ 0, 0,      // label x
  1, 1,      // label y
  2, _, 0,   // label z
  3, _, 1 ]  // label t

This idea can be used to fill any gaps in previous label rows, as long as the used slots in a row fits into the gaps. For example, if we have a table like:

[ 0, _, _, 1,  // label x
  _, 0, 1, _,  // label y
  ... ]

We can decrement y’s ID to fit it into the row for label x:

[ 0, 0, 1, 1,  // label x and y, interleaved
  ... ]

Conclusions

Collecting and numbering all labels in the program allows using a global table for mapping labels to offsets.

These offset tables can be made smaller by

Giving same number to labels that don’t occur in the same record
Giving same ID to records that become identical after the previous step
Tweaking label numbers so that rows without overlapping entries can be merged into a single row

The result is a very compact representation of record objects (no extra words in the header or unused space in the payload needed) and a fast polymorphic field access.

The offset table should also be small in practice, because different parts of the program will probably use disjoint set of names, and different labels and records will have the same IDs. In the remaining cases, tweaking label IDs to compact the table should help.

References

I’ve learned about the global table approach and some of the optimizations from the Dart compiler, which implements virtual calls using a “global dispatch table” (GDT), indexed by classID + methodID in call sites. See “Introduction to Dart VM” for a description of how Dart AOT and JIT generate GDTs.

If you are interested in seeing some code, here is where we generate the GDT in dart2wasm (Dart’s Wasm backend). The outer loop finds a selector ID (label ID in our examples) for a row (list of records in our examples, list of classes in dart2wasm). The inner loop do { ... } while (!fits) starts from the first row with gaps, and tries to fit the current row into the gaps. In the worst case it skips all of the rows, in which case rest of the code appends the table with the new row.

Dart will soon have records, and for the dart2wasm implementation of records I’m thinking of using some of the ideas described in this post. Dart records do not support width subtyping (you can’t pass {x, y, z} where {x, y} is expected), but because of the dynamic type, we can have a dynamically typed record that we index.

Thanks to José Manuel Calderón Trilla for his feedback on a draft of this blog post.

Products and sums, named and anonymous

2021-04-10T00:00:00Z

I was recently thinking about why do so many languages have tuples, which can be thought of as simple anonymous products (more on the definition of this below), but not something similar for sums. Both sum and product types are widely used, so it seems inconsistent to have anonymous products but not sums.

I recently tweeted about this and got helpful responses that made me realize that I got my definitions wrong. As I think more about what “anonymous type” means it became clear to me that the it’s not just tuples or other types with special syntax, instead of names. It’s more complicated than that.

So in this post I’d like to briefly talk about products and sums, and how are names used in type checking. I will then show a different way of type checking, and some examples from two widely used languages. Finally, I will argue that types are called “named” or “anonymous” depending on how they are checked.

Note that I’m not using any of these words as they are used in category theory or any other field of mathematics. These are mainly how I see them used in widely used PLs like Haskell, Rust, and OCaml, and in PL papers and books.

Products

A value of a product type contains zero or more fields with potentially different types. Some example product types are:

data Coordinate = Coordinate { x :: Int, y :: Int }: a product with two Int fields
data D = D Int String Float: a product with Int, String, and Float fields
data Empty = Empty: a product with no fields

Note that the way you access the fields does not matter. In the examples above, fields of a Coordinate value can be accessed with pattern matching, or with the generated functions x and y. In the second example, we can only access the fields with pattern matching.

What matters is: products contain zero or more fields. The fields can have different types.

Sums

A sum type specifies multiple “variants” (or “alternatives”), where each variant has a “name” (or “tag”, more on this later) and some number of fields.

A value of a sum type holds a name (or tag), and the fields of the variant with that name.

For example, if you have a parser for integers, you will want to return an integer when parsing succeeds, or an error message when something goes wrong. The sum type for the return value of your parse function would look like:

data ParseResult
  = Success Int
  | Fail String

Here, Success and Fail are names of the variants. Success variant has an Int field, and Fail variant has a String field.

A value of this type does not contain an Int and String at the same time. It’s either a Fail with a String field, or a Success with an Int field.

The way you access the fields is with pattern matching:

case parse_result of
   Success int -> ...
   Fail error_message -> ...

Names in type checking (nominal typing)

If I have two types, named T1 and T2, no matter how they are defined, they are considered different in Haskell, and most other widely used typed languages (Rust, Java, …). This is called “nominal” type checking, where differently named types are considered different, even if they are “structurally” the same. For example, data T1 = T Int and data T2 = T Int are structurally the same, but you can’t apply a value of type T2 to a function that expects T1.

What “structurally same” mean is open to interpretation. We will come to this later.

In addition, all types have names¹, even types like tuples, which may look like they don’t have names, like our Coordinate or ParseResult have.

Tuples in most languages are just a bunch of product types, like the ones you can define yourself. They are often pre-defined for arities 0 to some number, and they have a special, “mixfix” syntax, with parentheses and commas to separate the fields. Other than that, they are no different than the ones you can define yourself.

You can see GHC’s definition of tuples here. In GHC, you can use the name directly if you don’t want the mixfix syntax, like (,) 1 2. So the name for an 2-ary tuple is (,) in Haskell, and it has a special syntax so you can write more readable (1, 2) (or (Int, Int) in type context). Other than syntax, there’s nothing special about tuples.

So it’s clear that most languages don’t have anonymous types. All types have some kind of names, and two types are only “compatible” if the names match.

Before defining what anonymous types are, I would like to give two examples, from PureScript and OCaml, where types are not checked based on their names, but based on their “structure”.

Structural type checking for products

A record is a product type with named (or “labelled”) fields. Our Coordinate example is a record.

In PureScript, records can be defined without giving names to them. For example:

f :: { x :: Int, y :: Int } -> Int
f a = a.x + a.y

Here, f is a function that takes a record with two Int fields, named x and y, as an argument.

Here is a more interesting version of the same function:

f :: forall r . { x :: Int, y :: Int | r } -> Int
f a = a.x + a.y

This version takes a record with at least x :: Int and y :: Int fields, but it can have more fields. Using this version, this code type checks:

f { x: 1, y: 2, z: 3, t: 4 }

The r in this type is not too important. Important part is, in PureScript, records are not type checked nominally. Indeed, in the example above, type of the record with 4 fields is not defined, and no names are used for the record in the type signature of f.

You might think that the record braces and commas are similar to the tuple syntax, so the name could be something like {,}, maybe applied to x :: Int somehow (assuming there is a type-level representation of field names).

However, even if that’s the case, type checking of these types are quite different than tuples. We’ve already seen that we can pass a record with more fields. You can also reorder fields in the function type signature², or in the record expression, and it still works.

So type checking for PureScript is quite different than Haskell tuples.

This kind of type checking where you look at the “structure” rather than just the names is called structural type checking.

Now let’s take a look at an example for sum types.

Structural type checking for sum types

OCaml has named sum types, just like Haskell’s. Here is the OCaml version of our ParseResult type:

type parse_result =
  | Success of int
  | Fail of string

Name of this type is parse_result (following OCaml naming conventions), and it is type checked exactly the same way it is type checked in Haskell.

A second way of defining sum types in OCaml, and without names, is with polymorphic variants. Here’s the polymorphic variant for the same type:

type parse_result = [ `Success of int | `Fail of string ]

Crucially, even though we use a similar syntax with the type keyword, this is a type synonym. The right-hand side of this definition is an anonymous sum with two variants, tagged `Success and `Fail, with int and string fields, respectively.

Now, suppose I have a parse result handler, which, in addition to the success and failure cases, handles some “other” case as well:

let f = function
  | `Success i -> Printf.printf "Parse result: %d\n" i
  | `Fail msg -> Printf.printf "Parse failed: %s\n" msg
  | `Other -> Printf.printf "Wat?\n"

Type of this function as inferred by the OCaml compiler is:

[< `Fail of string | `Other | `Success of x ] -> unit

What this type says is that the function accepts any polymorphic variant that has the tags Fail, Other, and Success (with the specified field types), or some subset of these tags. So if I have a value of type parse_result:

let x : parse_result = `Success 123

I can pass it to f, even though f’s argument type is not exactly parse_result. Here’s the full example, run in utop: (utop # part is the prompt, lines after ;; are utop outputs)

utop # type parse_result = [ `Success of int | `Fail of string ];;
type parse_result = [ `Fail of string | `Success of int ]

utop # let f = function
  | `Success i -> Printf.printf "Parse result: %d\n" i
  | `Fail msg -> Printf.printf "Parse failed: %s\n" msg
  | `Other -> Printf.printf "Wat?\n";;
val f : [< `Fail of string | `Other | `Success of int ] -> unit = <fun>

utop # let x : parse_result = `Success 123;;
val x : parse_result = `Success 123

utop # f x;;
Parse result: 123
- : unit = ()

Neat!

Similar to PureScript records, and unlike Haskell tuples, type checking for OCaml polymorhic records is structural, not nominal.

Names -> nominal, ??? -> structural

Now that we have seen structural type checking as an alternative to name-based (nominal) type checking, and some examples, here is my attempt at defining anonymous types: If named types are type checked nominally, then the types that are structurally type checked are called “anonymous”.

In other words:

Nominally type checked types are named
Structurally type checked types are anonymous

According to this definition, Haskell and many other languages don’t have anonymous types. PureScript records are an example to anonymous products, and OCaml polymorphic variants are an example to anonymous sums.

Conclusions

Named types are checked nominally, anonymous types are checked structurally. According to this definition, Haskell, and many other languages, don’t have anonymous types, as all types are nominally checked.

Tuples are no exception: they have names, and type checked nominally.

PureScript records and OCaml polymorphic variants are great examples to anonymous products and sums, respectively.

Thanks to @_gilmi and @madgen_ for their helpful comments on a draft of this blog post.

With the exception of type synonyms. Type synonyms can be considered as simple macros for substituting types for names before type checking.↩︎
In Haskell, reordering stuff at the type level is often done with type families (type-level functions). Types are still checked nominally, but by rearranging them before type checking you can often have something somewhat similar to structural checking.↩︎

Conditional compilation based on crate type

2020-12-24T00:00:00Z

Suppose you have a no_std crate that you want to use in two ways:

As a self-contained static library, to link with other (non-Rust) code
As a Rust library, to import from another crate to test it

(1) is the main use case for this library. (2) is because you want to test this library and you want to be able to use Rust’s std and other Rust libraries for testing.

The Rust crate type for (1) is staticlib. For (2) you need rlib. (documentation on crate types)

Here’s the problem. To be able to generate staticlib you need to implement a panic handler as otherwise the code won’t know how to panic¹. However, if you define a panic handler, you won’t be able to use your crate in other crates anymore as your panic handler will clash with the std panic handler.

4 files needed to demonstrate this:

-- Cargo.toml for the library
[package]
name = "nostd_lib"
version = "0.1.0"
authors = []
edition = "2018"

[lib]
crate-type = ["staticlib", "rlib"]

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

-- lib.rs
#![no_std]

#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    loop {}
}

-- Cargo.toml for the importing crate
[package]
name = "nostd_bin"
version = "0.1.0"
authors = []
edition = "2018"

[dependencies]
nostd_lib = { path = "../nostd_lib" }

-- main.rs
extern crate nostd_lib;

fn main() {}

The library builds fine, but if you try to build nostd_bin you’ll get this error:

error: duplicate lang item in crate `nostd_lib` (which `nostd_bin` depends on): `panic_impl`.
  |
  = note: the lang item is first defined in crate `std` (which `nostd_bin` depends on)
  = note: first definition in `std` loaded from ...
  = note: second definition in `nostd_lib` loaded from ...

Which says you now have two panic handlers: one in std and one in your library.

If you remove the panic handler in the library then you won’t be able to build the library anymore:

error: `#[panic_handler]` function required, but not found

So you need some kind of conditional compilation, to generate panic handler only when generating staticlib. Unfortunately conditional compilation based on crate type is currently not possible. It is also not possible to specify target crate type when invoking cargo.

The least hacky way I could find to solve this (and without using anything other than just cargo build to build) is by having two Cargo.toml files.

Cargo really wants manifest files to be named Cargo.toml, so we put the files in different directories. In my case the top-level one is for staticlib and it looks like this:

[package]
name = "nostd_lib"
version = "0.1.0"
authors = []
edition = "2018"

[features]
default = ["panic_handler"]
panic_handler = []

[lib]
crate-type = ["staticlib"]

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

I also update lib.rs to only define the panic handler when the feature is enabled:

#[cfg(feature = "panic_handler")]
#[panic_handler]
fn panic(_: &core::panic::PanicInfo) -> ! {
    ...
}

Now I can build the library at the library’s top-level with just cargo build. Because the panic_handler feature is enabled by default in this Cargo.toml, the panic handler will be defined by default with just cargo build and static library will build and work fine.

For the rlib I create a similar Cargo.toml in rlib directory:

[package]
name = "nostd_lib"
version = "0.1.0"
authors = []
edition = "2018"

[lib]
crate-type = ["rlib"]
path = "../src/lib.rs"

[profile.dev]
panic = "abort"

[profile.release]
panic = "abort"

The differences are: this one only generates rlib, doesn’t define the panic_handler feature, and specifies the library source path explicitly (as it’s not in the default location relative to this Cargo.toml). It’s fine to refer to a feature that you never define in Cargo.toml in your code, so lib.rs is still fine, and the panic handler will never be built when you build the crate with this Cargo.toml.

Now in the importing crate I use this Cargo.toml instead of the top-level one:

[dependencies]
nostd_lib = { path = "../nostd_lib/rlib" }

And it works fine. The downside is I have two Cargo.toml files now, but in my case that’s not a big deal, as my Cargo.toml is quite small and have no dependencies other than libc².

I hope this is helpful. If you know any better way to do conditional compilation based on crate types, or to solve the problem of generating usable staticlib and rlibs from a single no_std crate, let me know!

You need a panic_handler even if you never panic in your crate (assuming that’s possible). For example, you can’t compile fn main() {} with no_std, panic=abort, and without a panic_handler: the compiler complains about the missing panic handler.↩︎
If you’re working on a no_std crate I think you won’t be able to find a lot of libraries that you can use anyway.↩︎

8 years of Haskell

2020-06-30T00:00:00Z

21 Jun 2020 was my last day at Well-Typed and as a GHC maintainer/developer. On 22nd I joined the programming language team at DFINITY to work on the Motoko programming language.

Here’s the summary of my 8 years writing Haskell pretty much non-stop:

In 2012 I wrote my first Haskell program, which was a chat server. I was reading “Real World Haskell” and “Learn You a Haskell for Great Good!” at the time and applying what I learned on this project.
In the same year I implemented my first programming language in Haskell. I don’t remember much about this project, I think it may be just a few extensions over the excellent Haskell tutorial “Write Yourself a Scheme in 48 hours”.
Also in 2012 I made a few commits to the programming language Fay. This was my first contribution to an open source compiler not written by me.
In 2013 I worked on four PL implementations, two of which were implemented from scratch in Haskell: A Prolog implementation and a K Lambda interpreter.

The other two projects were: A multi-stage ML-like language written in OCaml, and K Framework (in Java).
In 2014 I was accepted to Google Summer of Code to work on adding stack traces to GHCJS. The project was successful, and I made 88 commits to GHCJS during this period.

This was my first introduction to GHC. I made only one commit to GHC during this time, but I started reading the RTS and code generator to be able to implement cost-centre stacks in GHCJS, which taught me a lot.
Also in 2014, I briefly worked at a startup where I wrote Haskell.
In 2015 I joined Indiana University to do PhD in programming languages. In my first semester I worked on the paper “Efficient Communication and Collection with Compact Normal Forms” which was about a GHC extension. The paper was published the same year at ICFP.
In the same year I briefly worked on a torrent client in Haskell.
According to git logs, 2015 was the year where I started making some larger commits to GHC. I think I made a few dozen commits that year. What was happening in the background is that I was working on unboxed sums. At Haskell Implementors Workshop in 2015 my advisor gave a presentation on efficiency of data representation in Haskell. I don’t remember how the story developed, but I think we also talked to a few people at ICFP on how to improve the situation, and one of the idea that came up was unboxed sums. IIRC I started working on it soon after returning from ICFP.

The first somewhat working version was implemented as a plugin, using lots of unsafe coercions under the hood. It was good enough to run some examples.
(In 2015, I also studied various metaprogramming and partial evaluation ideas quite extensively. If you look at my blog posts published in 2015 you’ll see a lot of related blog posts. There are also a few related git repositories in my Github page. I also gave a related talk at HIW 2015.)
Early 2016, I don’t remember what I was doing in too much detail. I remember taking an advanced OS class around that time and enjoying it very much. This was also the time where I started to realize that the tools I’m using (mostly GHC) are full of bugs, and very inefficient. I kept studying program transformation ideas, with the goal of making Haskell “fast”. I also started using C more, partly for the OS class, but also in my hobby projects. For example, the first commit of tiny was made in January 2016 and the code was in C.
In mid-2016 I left Bloomington for Cambridge, UK, for an internship at Microsoft Research with SPJ. We mainly worked on implementing unboxed sums properly in the compiler (instead of as a hacky plugin), but I also did a lot of GHC maintenance work there with supervision of SPJ.

Unboxed sums was merged during my time at MSR.

In the rest of the internship I did a lot of reading, did GHC maintenance, and biked around Cambridge.
Most importantly, during my time at MSR I realized that I’m no longer interested in academic research. I don’t enjoy writing papers. I don’t feel like pushing a field forward while most of the tools I use every day are badly broken, inefficient, usually both. I started having job interviews while I was in the UK. I visited two companies for interviews, one in London, another one in Cambridge.

I also emailed my advisor, saying that I don’t want to come back to Bloomington.
Job interviews went badly, and I was back at Indiana University. Rest of 2016 was pretty horrible. I was depressed. I had no interest in research. I still helped publishing a paper, but I did not enjoy the process.

I still spent my last semester somewhat productively. I took enough classes this semester to leave IU with a masters degree, instead of empty handed (I was a PhD student, not masters). I also had some good job interviews and met good people from the Haskell community.

By the end of 2016 I accepted a job offer and left IU with masters degree to write Haskell for a startup.
In 2017 I worked for this startup for a year. I wrote lots of networking and concurrent code, and learned a lot about these topics and exception handling in Haskell. Until this my Haskell experience was mainly in the context of compilers, so this was quite educational for me.

I left the company at the end of that year to join Well-Typed to work on GHC full-time.
My time at Well-Typed was great, but also full of challenges, mainly related to working remotely.

I worked on GHC between 30 and 40 hours a week (some weeks as little as 24 hours, but no less than that). Few weeks after I joined I started working on a new garbage collector with a colleague. When I joined the project there were only type definitions in header files, and almost no code. I implemented the first sequential prototype of the new collector. After that we started collaborating more closely with my colleague while implementing the concurrent version. We found many bugs in both the design and implementation, and sorted out many edge cases during this time. I thoroughly enjoyed working on this project, even though it was clearly the most challenging project I ever worked on.

After the garbage collector I kept working as a maintainer until I left the company on a Sunday, Jun 21st, 2020. I made my last commit to a merge request that I was working on 21st.
On 22 Jun 2020 I joined DFINITY to work on the Motoko programming language, and this is where the story ends.

At the time of this writing I have 383 commits to GHC and I’m the 14th contributor with most commits. It feels bad to leave a project that I liked and contributed so much, but it’s also the right thing to do. After the GC was merged I started spending my time less and less productively, for many reasons, and I had lost my motivation to improve Haskell-the-language and GHC. Perhaps I can write more about these in another post.

gdb breakpoints with conditions on backtrace

2020-04-25T00:00:00Z

Being able so specify conditions in gdb breakpoints is quite useful. For example, if I’m interested in mmap(NULL, ...) calls I can do

break mmap if addr == 0

and gdb doesn’t break on mmap when the addr == 0 condition doesn’t hold.

I’ve used this many times to great effect, but it’s not always sufficient, sometimes I need to break not when a variable or argument has a specific value but the function is called (directly or indirectly) from another function. For example, when debugging a GHC RTS issue I sometimes want to inspect mmap calls made by the garbage collector.

As far as I know this is not possible using the standard break syntax, but gdb provides a Python API that allows setting breakpoints with conditions implemented in Python. Using this API it’s takes a few lines to implement this:

class FrameBp(gdb.Breakpoint):
    def __init__(self, spec, *args, frame=None, **kwargs):
        self.frame = frame
        super(FrameBp, self).__init__(spec, *args, **kwargs)

    def stop (self):
        frame = gdb.selected_frame().older()

        while frame:
            if frame.name() == self.frame:
                return True

            frame = frame.older()

        return False

When calling the constructor the first argument is the breakpoint specifier, which is basically the part after break ... in gdb’s break command. The frame argument is the function we look for before actually breaking. We only break if the function exists in the backtrace. Here’s an example use:

>>> python FrameBp("mmap", frame="GarbageCollect")
Breakpoint 1 at 0x7f3366243f00: file ../sysdeps/unix/sysv/linux/mmap64.c, line 44.

This will only break on mmap if the backtrace has GarbageCollect at some point. An example backtrace when the breakpoint is hit:

Breakpoint 1, __GI___mmap64 (addr=0x4200200000, len=1048576, prot=3, flags=50, fd=-1, offset=0) at ../sysdeps/unix/sysv/linux/mmap64.c:44
44        if (offset & MMAP_OFF_MASK)

>>> bt
#0  __GI___mmap64 (addr=0x4200200000, len=1048576, prot=3, flags=50, fd=-1, offset=0) at ../sysdeps/unix/sysv/linux/mmap64.c:44

...

#19 0x0000000003022c83 in GarbageCollect (collect_gen=0, do_heap_census=false, deadlock_detect=false, gc_type=0, cap=0x37ef500
, idle_cap=0x0) at rts/sm/GC.c:449

...

With some effort you could probably turn this into a proper gdb command and run it without the python ... part, but so far this works good enough for me.

New blog post published on Well-Typed's blog

2020-03-25T00:00:00Z

I recently published a new post on Well-Typed’s blog: “The problem with adding functions to compact regions”.

It’s also shared on Twitter and /r/haskell. If you have any questions/comments feel free to ping me in any of these places, or add a comment below!

Knot-tying: two more examples, and an alternative

2020-02-27T00:00:00Z

In the previous post we’ve looked at a representation of expressions in a programming language, what the representation makes easy and where we have to use knot-tying.

In this post I’m going to give two more examples, using the same expression representation from the previous post, and then talk about how to implement our passes using a different representation, without knot-tying.

Example: attaching typing information to Ids

Previously we attached arity and unfolding information to Ids. Now suppose that our language is typed, and up to some point our transformations rely on typing information. Similar to arity and unfolding fields we add one more field to Id:

data Id = Id
  { ..
  , idType :: Maybe Type
  }

The Maybe part is because when we no longer need the types we want to be able to clear the type fields to make the AST smaller. While we have only one heap object per Id, in an average program there’s still a lot of different Ids, and Type representation can get quite large, so this is worthwhile. This makes the working set smaller, which causes less GC work and improves compiler performance.

In our cyclic AST representation the only way to implement this without losing sharing is with a full-pass over the entire program, using knot-tying. The code is similar to the ones in the previous post.

Example: attaching unfoldings to Ids

Remember that in the previous post we represented the AST as:

data Expr
  = IdE Id
  | IntE Int
  | Lam Id Expr
  | App Expr Expr
  | IfE Expr Expr Expr
  | Let Id Expr Expr

data Id = Id
  { idName :: String
    -- ^ Unique name of the identifier
  , idArity :: Int
    -- ^ Arity of a lambda. 0 for non-lambdas.
  , idUnfolding :: Maybe Expr
    -- ^ RHS of a binder, used for inlining
  }

In this representation if I have a recursive definition like

let fac = \x . if x then x * fac (x - 1) else 1 in fac 5

In fac used in lambda body I want to be able to do idUnfolding and get the definition of this lambda. So the lambda refers to the Id for fac, and fac refers to the lambda in its idUnfolding field, forming a cycle.

In this representation only way to implement this is with knot-tying. An implementation that maintains a map from binders to their RHSs to update unfoldings of Ids in occurrence position does not work, because when we update an occurrence of the binder in its own RHS (i.e. in a recursive let) we end up invalidating the RHS that we’ve added to the map.

Here’s a knot-tying implementation that adds unfoldings (only the interesting bits):

addUnfoldings :: Expr -> Expr
addUnfoldings = go M.empty
  where
    go :: M.Map String Id -> Expr -> Expr
    go ids e = case e of

      IdE id ->
        IdE (fromMaybe id (M.lookup (idName id) ids))

      Let bndr rhs body ->
        let
          ids' = M.insert (idName bndr) bndr' ids
          rhs' = go ids' rhs
          bndr' = bndr{ idUnfolding = Just rhs' }
        in
          Let bndr{ idUnfolding = Just rhs' } rhs' (go ids' body)

      ...

As before we tie the knot in let case and use it in Id case.

It’s also possible to initialize idUnfolding fields when parsing, using monadic knot-tying (MonadFix). Full code is shown at the end of this post, but the interesting bit is when parsing lets and Ids:

parseLet :: Parser Expr
parseLet = do
    _ <- string "let"
    id_name <- parseIdName
    _ <- char '='

    (id, rhs) <- mfix $ \ ~(id_, _rhs) -> do
      modify (Map.insert id_name id_)
      rhs <- parseExpr
      return (Id{ idName = id_name, idArity = 0, idUnfolding = Just rhs }, rhs)

    _ <- string "in"
    body <- parseExpr
    return (Let id rhs body)

parseId' :: Parser Id
parseId' = do
    name <- parseIdName
    id_map <- get
    let def = Id{ idName = name, idArity = 0, idUnfolding = Nothing }
    return (fromMaybe def (Map.lookup name id_map))

The idea is very similar. When parsing a let we add a thunk for the binder with correct unfolding to a map. The map is then used when parsing Ids in the RHS and body of the let.

An alternative

A well-known way of associating information with identifiers in a compiler is by using a “symbol table”. Instead of adding information about Ids directly in the Id fields, we maintain a table (or multiple tables) that map Ids to the relevant information. Here’s one way to do this in our language:

data Expr
  = IdE String
  ...

data IdInfo = IdInfo
  { idArity :: Int
    -- ^ Arity of a lambda. 0 for non-lambdas.
  , idUnfolding :: Maybe Expr
    -- ^ RHS of a binder, used for inlining
  }

type SymTbl = Map.Map String IdInfo

In this representation we have to refer to the table for idArity or idUnfolding. That’s slightly more work than the previous representation where we could simply use the fields of an Id, but a lot of other things become much simpler and efficient.

Here’s dropUnusedBindings in this representation (only the interesting bits, full code is at the end of this post):

dropUnusedBindings :: Expr -> State SymTbl Expr
dropUnusedBindings =
    fmap snd . go Set.empty
  where
    go :: Set.Set String -> Expr -> State SymTbl (Set.Set String, Expr)
    go free_vars e0 = case e0 of

      Let bndr e1 e2 -> do
        (free2, e2') <- go free_vars e2
        if Set.member bndr free2 then do
          (free1, e1') <- go free_vars e1
          setIdArity bndr (countLambdas e1')
          return (Set.delete bndr (Set.union free1 free2), Let bndr e1' e2')
        else
          return (free2, e2')

      ...

Our pass is now stateful (updates the symbol table) and written in monadic style. Knot-tying is gone. We update the symbol table after processing a let RHS. Because Ids no longer have the arity information we don’t need to update anything other than the symbol table.

It’s now trivial to implement addUnfoldings:

addUnfoldings :: Expr -> State SymTbl ()
addUnfoldings e0 = case e0 of

    IdE{} ->
      return ()

    IntE{} ->
      return ()

    Lam arg body ->
      addUnfoldings body

    App e1 e2 -> do
      addUnfoldings e1
      addUnfoldings e2

    IfE e1 e2 e3 -> do
      addUnfoldings e1
      addUnfoldings e2
      addUnfoldings e3

    Let bndr e1 e2 -> do
      addUnfoldings e1
      addUnfoldings e2
      setIdUnfolding bndr e1

Doing it during parsing is also trivial, and shown in the full code at the end of this post. Updating typing information when we no longer need them is simply

dropTypes :: State SymTbl ()
dropTypes = modify (Map.map (\id_info -> id_info{ idType = Nothing }))

We could also maintain a separate table for typing information, in which case all we had to do would be to stop using that table.

Easy!

Final remarks

Cyclic AST representation in a purely functional language necessitates knot-tying and relies on lazy evaluation. A well-known alternative is using symbol tables. It works across languages (does not rely on lazy evaluation) and keeps the code simple.

Cyclic representations make using the information easier, while symbol tables make updating easier. Code for updating the information is shown above and the previous post. For using the information, compare:

-- Get the information in a cyclic representation
... (idUnfolding id) ...

-- Get the information using a symbol table
arity <- getIdUnfolding id

To me the monadic version is not too bad in terms of verbosity or convenience, especially because Haskell makes state passing so easy.

Some of the problems with knot-tying is as explained at the end of the previous post. What I did not mention in the previous post is the problems with efficiency, which are demonstrated better in this post.

In the “typing information” example, with the cyclic representation I need to copy the entire AST to update every single Id occurrence and binder. With the symbol table I need to update just the table, which is much smaller than the AST.
In the unfolding example, with the cyclic representation I again need to copy the entire AST or use MonadFix if I’m doing it in parsing. With a symbol table the pass does not update the AST, only updates the table. If I’m doing it in parsing then I simply add an entry to the table after parsing a let. (full code at the end of this post)

In use sites, getIdArity (a map lookup) does more work than idArity (just follows a pointer). While I don’t have any benchmarks on this, I doubt that this is bad enough to make cyclic representation and knot-tying preferable.

Examples in these two posts are inspired by GHC:

GHC keeps information about Ids in an Id field with type IdInfo.
IdInfo type holds information like arity and unfolding.
For type information Id has another field: varType.
The process of throwing away information that are no longer needed is called “zapping”. It happens in many places in GHC, one example is the tidying pass (prepares code for interface file generation) that zaps unfoldings.
Knot-tying is used in many places in the compiler, here’s an example where we use knot-tying to update IdInfos with code generator-generated information.

In the first post I mostly argued that knot-tying makes things more complicated, and in this post I showed that knot-tying is necessary because of the cyclic representation. If we want to do the same without knot-tying we either have to introduce mutable references (e.g. IORefs) in our AST (not shown in this post), or have to use a non-cyclic representation with symbol tables.

Between these two representations, I think non-cyclic representation with symbol tables is a better choice.

Full code (knot-tying)

osa1.net - All posts

Subtyping and subsumption

Safety of subsumption

Performance of subsumption

No conclusions this time

OOP is not that bad, actually

What even is OOP?

An example of what this allows

Attempting it in Haskell

Effect monad approach

Composable effects

Conclusions

My thoughts on OCaml

No standard and easy way of implementing interfaces

Bad standard library

Syntax problems

Rest of the package is also not that good

But at least it’s a functional language?

When should I use it?

Fast polymorphic record access

Row polymorphism and record subtyping, briefly

(0) Records as maps

(1) Passing accessors as parameters

Prerequisite: integers for labels

(2) Per-record label-to-field-offset tables

(2.1) Making the tables global

(2.2) Sharing label IDs and record shapes

(2.3) Flattening the table

(2.4) Removing the constant factor

(2.5) Compacting the table further

Conclusions

References

Products and sums, named and anonymous

Products

Sums

Names in type checking (nominal typing)

Structural type checking for products

Structural type checking for sum types

Names -> nominal, ??? -> structural

Conclusions

Conditional compilation based on crate type

8 years of Haskell

gdb breakpoints with conditions on backtrace

New blog post published on Well-Typed's blog

Knot-tying: two more examples, and an alternative

Example: attaching typing information to Ids

Example: attaching unfoldings to Ids

An alternative

Final remarks