Products and sums, named and anonymous

April 10, 2021 - Tagged as: en, plt, types.

I was recently thinking about why do so many languages have tuples, which can be thought of as simple anonymous products (more on the definition of this below), but not something similar for sums. Both sum and product types are widely used, so it seems inconsistent to have anonymous products but not sums.

I recently tweeted about this and got helpful responses that made me realize that I got my definitions wrong. As I think more about what “anonymous type” means it became clear to me that the it’s not just tuples or other types with special syntax, instead of names. It’s more complicated than that.

So in this post I’d like to briefly talk about products and sums, and how are names used in type checking. I will then show a different way of type checking, and some examples from two widely used languages. Finally, I will argue that types are called “named” or “anonymous” depending on how they are checked.

Note that I’m not using any of these words as they are used in category theory or any other field of mathematics. These are mainly how I see them used in widely used PLs like Haskell, Rust, and OCaml, and in PL papers and books.

Products

A value of a product type contains zero or more fields with potentially different types. Some example product types are:

• `data Coordinate = Coordinate { x :: Int, y :: Int }`: a product with two `Int` fields
• `data D = D Int String Float`: a product with `Int`, `String`, and `Float` fields
• `data Empty = Empty`: a product with no fields

Note that the way you access the fields does not matter. In the examples above, fields of a `Coordinate` value can be accessed with pattern matching, or with the generated functions `x` and `y`. In the second example, we can only access the fields with pattern matching.

What matters is: products contain zero or more fields. The fields can have different types.

Sums

A sum type specifies multiple “variants” (or “alternatives”), where each variant has a “name” (or “tag”, more on this later) and some number of fields.

A value of a sum type holds a name (or tag), and the fields of the variant with that name.

For example, if you have a parser for integers, you will want to return an integer when parsing succeeds, or an error message when something goes wrong. The sum type for the return value of your parse function would look like:

``````data ParseResult
= Success Int
| Fail String``````

Here, `Success` and `Fail` are names of the variants. `Success` variant has an `Int` field, and `Fail` variant has a `String` field.

A value of this type does not contain an `Int` and `String` at the same time. It’s either a `Fail` with a `String` field, or a `Success` with an `Int` field.

The way you access the fields is with pattern matching:

``````case parse_result of
Success int -> ...
Fail error_message -> ...``````

Names in type checking (nominal typing)

If I have two types, named `T1` and `T2`, no matter how they are defined, they are considered different in Haskell, and most other widely used typed languages (Rust, Java, …). This is called “nominal” type checking, where differently named types are considered different, even if they are “structurally” the same. For example, `data T1 = T Int` and `data T2 = T Int` are structurally the same, but you can’t apply a value of type `T2` to a function that expects `T1`.

What “structurally same” mean is open to interpretation. We will come to this later.

In addition, all types have names1, even types like tuples, which may look like they don’t have names, like our `Coordinate` or `ParseResult` have.

Tuples in most languages are just a bunch of product types, like the ones you can define yourself. They are often pre-defined for arities 0 to some number, and they have a special, “mixfix” syntax, with parentheses and commas to separate the fields. Other than that, they are no different than the ones you can define yourself.

You can see GHC’s definition of tuples here. In GHC, you can use the name directly if you don’t want the mixfix syntax, like `(,) 1 2`. So the name for an 2-ary tuple is `(,)` in Haskell, and it has a special syntax so you can write more readable `(1, 2)` (or `(Int, Int)` in type context). Other than syntax, there’s nothing special about tuples.

So it’s clear that most languages don’t have anonymous types. All types have some kind of names, and two types are only “compatible” if the names match.

Before defining what anonymous types are, I would like to give two examples, from PureScript and OCaml, where types are not checked based on their names, but based on their “structure”.

Structural type checking for products

A record is a product type with named (or “labelled”) fields. Our `Coordinate` example is a record.

In PureScript, records can be defined without giving names to them. For example:

``````f :: { x :: Int, y :: Int } -> Int
f a = a.x + a.y``````

Here, `f` is a function that takes a record with two `Int` fields, named `x` and `y`, as an argument.

Here is a more interesting version of the same function:

``````f :: forall r . { x :: Int, y :: Int | r } -> Int
f a = a.x + a.y``````

This version takes a record with at least `x :: Int` and `y :: Int` fields, but it can have more fields. Using this version, this code type checks:

``f { x: 1, y: 2, z: 3, t: 4 }``

The `r` in this type is not too important. Important part is, in PureScript, records are not type checked nominally. Indeed, in the example above, type of the record with 4 fields is not defined, and no names are used for the record in the type signature of `f`.

You might think that the record braces and commas are similar to the tuple syntax, so the name could be something like `{,}`, maybe applied to `x :: Int` somehow (assuming there is a type-level representation of field names).

However, even if that’s the case, type checking of these types are quite different than tuples. We’ve already seen that we can pass a record with more fields. You can also reorder fields in the function type signature2, or in the record expression, and it still works.

So type checking for PureScript is quite different than Haskell tuples.

This kind of type checking where you look at the “structure” rather than just the names is called structural type checking.

Now let’s take a look at an example for sum types.

Structural type checking for sum types

OCaml has named sum types, just like Haskell’s. Here is the OCaml version of our `ParseResult` type:

``````type parse_result =
| Success of int
| Fail of string``````

Name of this type is `parse_result` (following OCaml naming conventions), and it is type checked exactly the same way it is type checked in Haskell.

A second way of defining sum types in OCaml, and without names, is with polymorphic variants. Here’s the polymorphic variant for the same type:

``type parse_result = [ `Success of int | `Fail of string ]``

Crucially, even though we use a similar syntax with the `type` keyword, this is a type synonym. The right-hand side of this definition is an anonymous sum with two variants, tagged ``Success` and ``Fail`, with `int` and `string` fields, respectively.

Now, suppose I have a parse result handler, which, in addition to the success and failure cases, handles some “other” case as well:

``````let f = function
| `Success i -> Printf.printf "Parse result: %d\n" i
| `Fail msg -> Printf.printf "Parse failed: %s\n" msg
| `Other -> Printf.printf "Wat?\n"``````

Type of this function as inferred by the OCaml compiler is:

``[< `Fail of string | `Other | `Success of x ] -> unit``

What this type says is that the function accepts any polymorphic variant that has the tags `Fail`, `Other`, and `Success` (with the specified field types), or some subset of these tags. So if I have a value of type `parse_result`:

``let x : parse_result = `Success 123``

I can pass it to `f`, even though `f`’s argument type is not exactly `parse_result`. Here’s the full example, run in utop: (`utop #` part is the prompt, lines after `;;` are utop outputs)

``````utop # type parse_result = [ `Success of int | `Fail of string ];;
type parse_result = [ `Fail of string | `Success of int ]

utop # let f = function
| `Success i -> Printf.printf "Parse result: %d\n" i
| `Fail msg -> Printf.printf "Parse failed: %s\n" msg
| `Other -> Printf.printf "Wat?\n";;
val f : [< `Fail of string | `Other | `Success of int ] -> unit = <fun>

utop # let x : parse_result = `Success 123;;
val x : parse_result = `Success 123

utop # f x;;
Parse result: 123
- : unit = ()``````

Neat!

Similar to PureScript records, and unlike Haskell tuples, type checking for OCaml polymorhic records is structural, not nominal.

Names -> nominal, ??? -> structural

Now that we have seen structural type checking as an alternative to name-based (nominal) type checking, and some examples, here is my attempt at defining anonymous types: If named types are type checked nominally, then the types that are structurally type checked are called “anonymous”.

In other words:

• Nominally type checked types are named
• Structurally type checked types are anonymous

According to this definition, Haskell and many other languages don’t have anonymous types. PureScript records are an example to anonymous products, and OCaml polymorphic variants are an example to anonymous sums.

Conclusions

Named types are checked nominally, anonymous types are checked structurally. According to this definition, Haskell, and many other languages, don’t have anonymous types, as all types are nominally checked.

Tuples are no exception: they have names, and type checked nominally.

PureScript records and OCaml polymorphic variants are great examples to anonymous products and sums, respectively.