July 22, 2016 - Tagged as: en, haskell, ghc.
The unboxed sums patch that implements unlifted, unboxed sum types (as described in this Wiki page) was merged yesterday, and a /r/haskell discussion emerged shortly after. As the implementor, I tried to answer questions there, but to keep answers more organized I wanted to write a blog post about it.
The reason I’m not writing this to the Wiki page is because this is about current plans and status of the feature. The wiki page may be updated in the future as the feature evolves and/or may be edited by others. This page reflects the current status as of today, future plans, and my own ideas.
This feature is designed to complement the similar feature for product types (tuples), called “unboxed tuples”. The syntax is thus chosen to reflect this idea. Instead of commas in the unboxed tuple syntax, we used bars (similar to how bars used in sum type declarations). The syntax looks bad for several reasons:
Type argument of an alternative have to be a single type. If we want multiple types in an alternative, we have to use an unboxed tuple. For example, unboxed sum version of the type data T = T1 Int | T2 String Bool
is (# Int | (# String, Bool #) #)
. That’s a lot of parens and hashes.
Similarly, for nullary alternatives (alternatives/constructors with no arguments) we have to use empty unboxed tuples. So a bool-like type looks like (# (# #) | (# #) #)
.
Data constructors use the same syntax, except we have to put spaces between bars. For example, if you have a type with 10 alternatives, you do something like (# | | | | value | | | | | #)
. Space between bars is optional in the type syntax, but not optional in the term syntax. The reason is because otherwise we’d have to steal some existing syntax. For example, (# ||| a #)
can be parsed as singleton unboxed tuple of Control.Arrow.|||
applied to an argument, or an unboxed sum with 4 alternatives.
Note that the original Wiki page for unboxed sums included a “design questions” section that discussed some alterantive syntax (see this version). Nobody made any progress to flesh out the ideas, and I updated the Wiki page to reflect the implementation. So it was known that the syntax is not good, but it just wasn’t a major concern.
Answer to the second question is also an answer to this question.
We’re not expecting users to use this type extensively. It’ll mostly be used by the compiler, for optimizations. In fact, we could have skipped the front-end syntax entirely, and it’d be OK for the most part. If you haven’t used unboxed tuples before, you probably won’t be using unboxed sums.
The only place you may want to use this syntax is when you’re writing a high-performance library or program, and you have a sum type that’s used strictly and can take advantage of removing a level of indirection.
A detailed answer would take too long, but here’s a summary:
Constructed product analysis can now be used for returning sums efficiently. Note that this feature was left as “future work” in the paper (which is from 2004. See section 3.2). The high-level idea is that if a function returns a value that it constructs, then instead of boxing the components of the value and returning a boxed object, it can just return the components instead. In the case where the function result is directly scrutinized (i.e. case expressions), this usually reduces allocations. In other cases, it moves the allocation from the callee to the call site, which in turn leads to stack allocation is some cases (when the object doesn’t escape from the scope).
For product types, unboxed tuples are used for returning the value without heap allocation. For sum types, we use unboxed sums.
Result of strictness (or “demand”) analysis can now be used to pass sums efficiently. As a result worker/wrapper transformations can now be done for functions that take sum arguments. See this Wiki page for demand analysis and this 2014 paper.
{-# UNPACK #-}
pragmas now work on sum types, using unboxed sums under the hood.
Note that none of these need a concrete syntax for unboxed sums.
Hopefully this clarifies some questions and concerns, especially about the syntax. We have plenty of time until the first RC for 8.2 (mid-February 2017), so it’s certainly possible to improve the syntax, and I’ll be working on that part once I’m done with the optimizations.