osa1

On -XStrict

November 16, 2015 - Tagged as: en, haskell.

-XStrict has landed in HEAD a couple of days ago, and judged from the upvotes it seems like /r/haskell was quite excited about it.

In the thread I tried to answer questions about -XStricts effects on programs’ semantics. Does it make the language effectively call-by-value? Do I still have bottoms in my values? Do I lose infinite lists(streams)? In this post I’ll try to give a more organized answer, with some answers to the questions asked in the Reddit thread.

Let’s think about how to create a thunk in Haskell:

Create a let-binding. The RHS of let-binding is a thunk until actually use it.
Create a where-binding. This is just a syntactic sugar for a let-binding, so I won’t consider this as a different case.
Pass an argument to a function or a data constructor. The argument will only be evaluated when it’s actually “used”.

Here I deliberately don’t define what I mean by “used”, because it’ll complicate the discussion a lot.

Now, with -XStrict, we have a bang pattern in every binder. This means that:

Let-bindings are now strict. E.g. if we have something like:
```
let a = ... in <body>
```
a is now evaluated before <body>, and so we can be sure that it won’t be bottom in <body>. Note however that this isn’t to say that a can’t contain bottoms. Here I’m just saying that a can’t be bottom in <body>.
Function arguments are now strict. In a function like this:
```
f a b = <body>
```
a and b can’t be bottom in <body>.
Data constructor arguments(fields) are now strict. If we have this:
```
data List a = Nil | Cons a (List a)
```
The two fields of Cons are now strict. Now, this case may look a bit tricky at first. What we really say here is that once a List is constructed, it can’t contain bottoms. We can still do something like undefined :: List Int, but the program immediately fails, instead of running until you try to pattern match on that undefined value like in the lazy case. This follows from the first two rules. Keep reading for more details.

When all these combined, it means that our programs are evaluated just like how they would be in a call-by-value language. For example, if we have:

let a = ...  in <body>

We can be sure that a won’t be bottom in <body>, and it can’t contain bottoms too! This follows from all the rules above. The first rule only says that a can’t be bottom itself. It doesn’t say anything about the fields(subexpressions) of a. Third rule says that it can’t contain bottom fields.

When we make all the binders and fields strict, including all the modules in all the dependencies(base etc.), we guarantee that we our programs evaluate like in a call-by-value language.

Now, call-by-value, call-by-name(and it’s efficient implementation call-by-need) etc. are really about how to evaluate a function application. In our case since we have a strictness annotation in all the function arguments, arguments will be evaluated before being passed to the function. Which really means evaluating the function application in call-by-value semantics.

Below are some questions and answers that I answered in the Reddit thread and in some follow-up threads.

What about standard list, tuple etc. types?

Unless we compile modules that define those using -XStrict, they’ll stay non-strict. For the standard types, we need base compiled with -XStrict. In practice this will probably never happen. But I think we can have another base, say, base-strict, which is the same base, except compiled with -XStrict. In this case depending on which one we’re using our lists, tuples etc. become strict or lazy.

What about monadic code?

Monadic code is really not special in any sense. When talking about the semantics we should see through the syntactic sugar. Say we have this:

do a <- m1
   b <- m2 a
   return (a + b)

This is really just a syntactic sugar for:

m1 >>= (\a -> m2 a >>= (\b -> return (a + b)))

With -XStrict, function arguments a and b and all the arguments of >>= and return will be strict. This makes the whole code strict. In this code this means that m1 will be evaluated before m2 a is evaluated, and returns return value will be strict etc.

What about list comprehensions and infinite lists(streams)?

A list comprehension like [1..] is again a syntactic sugar. It’s expanded form is enumFrom 1. enumFrom’s type is Enum a => a -> [a]. Let’s say we’re using Enum Int here. Since the instance is defined in base, and lists are also defined in base, this code will still work. However, if we compile base with -XStrict, this code loops because the standard list type will become strict.

In practice we would probably define strict and lazy lists separately to have the laziness when we need.

What about higher-order functions?

Since we don’t distinguish strict and lazy functions in type level, when we have a higher-order functions it may seem like we’d loose the guarantees. But this is not the case, at least not in general. Suppose we have this:

map :: (a -> b) -> [a] -> [b]
map _ []       = []
map f (x : xs) = f x : map f xs

This is compiled with -XStrict. Now suppose we pass a function f which is lazy in it’s argument. Since (:) is strict in this code, we’ll still evaluate the f x before returning. Our guarantee that the list won’t be bottom and won’t have bottom still holds.

See also the paper “Types are calling conventions”.

Haskell’s denotational semantics says that lifted types have bottoms

This is true. Even if we have this strict type:

data List a = Nil | Cons !a !(List a)

We can construct bottom values of this type, using, for example, undefined :: List Int or (let x = x in x) :: Blah.

However, if you think about how this value will be evaluated you’ll realize that this is exactly like how it would be evaluated in a call-by-value language. For example, if we try to bind it to a value:

let a = undefined :: List Int in <body>

This code will fail before <body> is run. In <body> a can’t be bottom and can’t contain bottoms.

(See also this ghc-devs thread about adding user defined unlifted data types to GHC. With this we could eliminate all the bottoms in some user-defined types.)

But Haskell will still generate thunks in the RTS level?

This is exactly right. -XStrict is really a very simple compiler pass that adds strictness annotations to every binder and field. We don’t have any changes in the GHC RTS to take advantage of additional strictness.

In other words, operational semantics of the language and implementation of this operational semantics in GHC RTS is still the same. We just do a program transformation to generate a program that evaluates like it would in a call-by-value language.

This means that if we have this program:

let a = undefined :: List Int in <body>

A thunk is still constructed in the runtime, but it’s evaluated before the <body> is evaluated. So this code fails before <body> is evaluated, and if we evaluate <body> it means that a is not bottom and doesn’t have bottoms.

This is another potential improvement over the -XStrict. For more details and some optimizations see this ghc-devs thread.