April 10, 2021 - Tagged as: en, plt, types.
I was recently thinking about why do so many languages have tuples, which can be thought of as simple anonymous products (more on the definition of this below), but not something similar for sums. Both sum and product types are widely used, so it seems inconsistent to have anonymous products but not sums.
I recently tweeted about this and got helpful responses that made me realize that I got my definitions wrong. As I think more about what “anonymous type” means it became clear to me that the it’s not just tuples or other types with special syntax, instead of names. It’s more complicated than that.
So in this post I’d like to briefly talk about products and sums, and how are names used in type checking. I will then show a different way of type checking, and some examples from two widely used languages. Finally, I will argue that types are called “named” or “anonymous” depending on how they are checked.
Note that I’m not using any of these words as they are used in category theory or any other field of mathematics. These are mainly how I see them used in widely used PLs like Haskell, Rust, and OCaml, and in PL papers and books.
A value of a product type contains zero or more fields with potentially different types. Some example product types are:
data Coordinate = Coordinate { x :: Int, y :: Int }
: a product with two Int
fieldsdata D = D Int String Float
: a product with Int
, String
, and Float
fieldsdata Empty = Empty
: a product with no fieldsNote that the way you access the fields does not matter. In the examples above, fields of a Coordinate
value can be accessed with pattern matching, or with the generated functions x
and y
. In the second example, we can only access the fields with pattern matching.
What matters is: products contain zero or more fields. The fields can have different types.
A sum type specifies multiple “variants” (or “alternatives”), where each variant has a “name” (or “tag”, more on this later) and some number of fields.
A value of a sum type holds a name (or tag), and the fields of the variant with that name.
For example, if you have a parser for integers, you will want to return an integer when parsing succeeds, or an error message when something goes wrong. The sum type for the return value of your parse function would look like:
data ParseResult
= Success Int
| Fail String
Here, Success
and Fail
are names of the variants. Success
variant has an Int
field, and Fail
variant has a String
field.
A value of this type does not contain an Int
and String
at the same time. It’s either a Fail
with a String
field, or a Success
with an Int
field.
The way you access the fields is with pattern matching:
case parse_result of
Success int -> ...
Fail error_message -> ...
If I have two types, named T1
and T2
, no matter how they are defined, they are considered different in Haskell, and most other widely used typed languages (Rust, Java, …). This is called “nominal” type checking, where differently named types are considered different, even if they are “structurally” the same. For example, data T1 = T Int
and data T2 = T Int
are structurally the same, but you can’t apply a value of type T2
to a function that expects T1
.
What “structurally same” mean is open to interpretation. We will come to this later.
In addition, all types have names1, even types like tuples, which may look like they don’t have names, like our Coordinate
or ParseResult
have.
Tuples in most languages are just a bunch of product types, like the ones you can define yourself. They are often pre-defined for arities 0 to some number, and they have a special, “mixfix” syntax, with parentheses and commas to separate the fields. Other than that, they are no different than the ones you can define yourself.
You can see GHC’s definition of tuples here. In GHC, you can use the name directly if you don’t want the mixfix syntax, like (,) 1 2
. So the name for an 2-ary tuple is (,)
in Haskell, and it has a special syntax so you can write more readable (1, 2)
(or (Int, Int)
in type context). Other than syntax, there’s nothing special about tuples.
So it’s clear that most languages don’t have anonymous types. All types have some kind of names, and two types are only “compatible” if the names match.
Before defining what anonymous types are, I would like to give two examples, from PureScript and OCaml, where types are not checked based on their names, but based on their “structure”.
A record is a product type with named (or “labelled”) fields. Our Coordinate
example is a record.
In PureScript, records can be defined without giving names to them. For example:
f :: { x :: Int, y :: Int } -> Int
f a = a.x + a.y
Here, f
is a function that takes a record with two Int
fields, named x
and y
, as an argument.
Here is a more interesting version of the same function:
f :: forall r . { x :: Int, y :: Int | r } -> Int
f a = a.x + a.y
This version takes a record with at least x :: Int
and y :: Int
fields, but it can have more fields. Using this version, this code type checks:
f { x: 1, y: 2, z: 3, t: 4 }
The r
in this type is not too important. Important part is, in PureScript, records are not type checked nominally. Indeed, in the example above, type of the record with 4 fields is not defined, and no names are used for the record in the type signature of f
.
You might think that the record braces and commas are similar to the tuple syntax, so the name could be something like {,}
, maybe applied to x :: Int
somehow (assuming there is a type-level representation of field names).
However, even if that’s the case, type checking of these types are quite different than tuples. We’ve already seen that we can pass a record with more fields. You can also reorder fields in the function type signature2, or in the record expression, and it still works.
So type checking for PureScript is quite different than Haskell tuples.
This kind of type checking where you look at the “structure” rather than just the names is called structural type checking.
Now let’s take a look at an example for sum types.
OCaml has named sum types, just like Haskell’s. Here is the OCaml version of our ParseResult
type:
type parse_result =
of int
| Success of string | Fail
Name of this type is parse_result
(following OCaml naming conventions), and it is type checked exactly the same way it is type checked in Haskell.
A second way of defining sum types in OCaml, and without names, is with polymorphic variants. Here’s the polymorphic variant for the same type:
type parse_result = [ `Success of int | `Fail of string ]
Crucially, even though we use a similar syntax with the type
keyword, this is a type synonym. The right-hand side of this definition is an anonymous sum with two variants, tagged `Success
and `Fail
, with int
and string
fields, respectively.
Now, suppose I have a parse result handler, which, in addition to the success and failure cases, handles some “other” case as well:
let f = function
Printf.printf "Parse result: %d\n" i
| `Success i -> Printf.printf "Parse failed: %s\n" msg
| `Fail msg -> Printf.printf "Wat?\n" | `Other ->
Type of this function as inferred by the OCaml compiler is:
of string | `Other | `Success of x ] -> unit [< `Fail
What this type says is that the function accepts any polymorphic variant that has the tags Fail
, Other
, and Success
(with the specified field types), or some subset of these tags. So if I have a value of type parse_result
:
let x : parse_result = `Success 123
I can pass it to f
, even though f
’s argument type is not exactly parse_result
. Here’s the full example, run in utop: (utop #
part is the prompt, lines after ;;
are utop outputs)
type parse_result = [ `Success of int | `Fail of string ];;
utop # type parse_result = [ `Fail of string | `Success of int ]
let f = function
utop # Printf.printf "Parse result: %d\n" i
| `Success i -> Printf.printf "Parse failed: %s\n" msg
| `Fail msg -> Printf.printf "Wat?\n";;
| `Other -> val f : [< `Fail of string | `Other | `Success of int ] -> unit = <fun>
let x : parse_result = `Success 123;;
utop # val x : parse_result = `Success 123
utop # f x;;123
Parse result: unit = () - :
Neat!
Similar to PureScript records, and unlike Haskell tuples, type checking for OCaml polymorhic records is structural, not nominal.
Now that we have seen structural type checking as an alternative to name-based (nominal) type checking, and some examples, here is my attempt at defining anonymous types: If named types are type checked nominally, then the types that are structurally type checked are called “anonymous”.
In other words:
According to this definition, Haskell and many other languages don’t have anonymous types. PureScript records are an example to anonymous products, and OCaml polymorphic variants are an example to anonymous sums.
Named types are checked nominally, anonymous types are checked structurally. According to this definition, Haskell, and many other languages, don’t have anonymous types, as all types are nominally checked.
Tuples are no exception: they have names, and type checked nominally.
PureScript records and OCaml polymorphic variants are great examples to anonymous products and sums, respectively.
Thanks to @_gilmi and @madgen_ for their helpful comments on a draft of this blog post.
With the exception of type synonyms. Type synonyms can be considered as simple macros for substituting types for names before type checking.↩︎
In Haskell, reordering stuff at the type level is often done with type families (type-level functions). Types are still checked nominally, but by rearranging them before type checking you can often have something somewhat similar to structural checking.↩︎