osa1 github gitlab twitter cv rss

Subtyping and subsumption

October 21, 2024 - Tagged as: en, plt.

Subtyping is a relation between two types. It often comes with a typing rule called “subsumption”, which says that if type B is a subtype of type A (usually shown as B <: A), then a value of type B can be assumed to have type A.

The crucial part is that subsumption is implicit, the programmer doesn’t explicitly cast the value with type B to type A.

When we make an operation implicit in a language, we need to make sure that it is (1) safe (2) performant. Users will be doing it without realizing, and we don’t want to accidentally break things or make them slow.

Let’s consider how we can make subsumption safe and performant.

Safety of subsumption

Different languages give different safety guarantees. High-level languages often guarantee:

  1. Memory safety: a memory read or write shouldn’t cause undefined behavior.

    Examples: out-of-bounds array accesses should be caught, dangling pointers shouldn’t be allowed or dereferencing them should be caught in runtime.

  2. Type safety: static guarantees of the language’s type system should be uphold.

    Example: if I have a function f : A -> B and a value x : A after subsumption, f(x) shouldn’t fail in compile time or runtime.

There could be different safeties that the language guarantees. Some of those safeties may also be checked in runtime instead of compile time.

Whatever safeties the language guarantees, they must be preserved with subsumption.

From a programmer’s perspective however, these are not enough to make sure that the program will work as before when subsumption is used. If I can pass a value of type B where A is expected, I need to make sure B, when used as A, acts like A.

This is called “behavioral subtyping” (or “substitutability”), and it depends on not the types of A’s operations but the observable behaviors of A and its subtypes.

I don’t have a good real-world example of this, but you can imagine two types with the same public APIs that work differently. Since the public APIs are the same one can be made subtype of the other and (1) and (2) would still be satisfied, but doing that would cause bugs when one is accidentally passed as the other.

Performance of subsumption

Definition of “fast” or “performant” also depends on the language. A C++ programmer’s fast and Python programmer’s fast are often not the same.

However in general, heap allocation should be avoided.

Object-oriented languages (as defined in my previous post) without multiple inheritance can often implement subsumption of reference values as no-op, i.e. values of type B work as A in runtime without any changes or copying.

Multiple inheritance makes things more complicated, but a reference to an object can still be converted to a reference of one of its supertypes by just adjusting the pointer value.

With unboxed/value types, conceptually, the value needs to be copied as its supertype, but that operation is often no-op. Consider an unboxed record (x: Int, y: Int, z: Int) that we store in a variable a. In runtime, a actually holds multiple stack locations or registers. When we copy it as let b: (x: Int, y: Int) = a, we don’t have to allocate new stack locations for b.x and b.y, we just map those locations to the same locations as a.x and a.y. When we pass b to a function, we pass a.x and a.y.

Where copying becomes a requirement and prohibitive is when you have something like ReadOnlyList<(x: Int, y: Int, z: Int)> and want to upcast it to ReadOnlyList<(x: Int, y: Int)> (the records are unboxed). From the safety perspective this operation is fine, but you have to allocate a new list and copy all the values.

I think this is rarely a problem in practice though, because most generic types, like List<T>, end up being invariant in T anyway, because their API often uses T in both covariant and contravariant positions. So List<(x: Int, y: Int)> is not a supertype of List<(x: Int, y: Int, z: Int)> and subsumption does not apply.

No conclusions this time

In this short post I just wanted to give some definitions that I’m hoping to refer to in future posts.