November 16, 2015 - Tagged as: en, haskell.
-XStrict
has landed in HEAD a couple of days ago, and judged from the upvotes it seems like /r/haskell was quite excited about it.
In the thread I tried to answer questions about -XStrict
s effects on programs’ semantics. Does it make the language effectively call-by-value? Do I still have bottoms in my values? Do I lose infinite lists(streams)? In this post I’ll try to give a more organized answer, with some answers to the questions asked in the Reddit thread.
Let’s think about how to create a thunk in Haskell:
Create a let-binding. The RHS of let-binding is a thunk until actually use it.
Create a where-binding. This is just a syntactic sugar for a let-binding, so I won’t consider this as a different case.
Pass an argument to a function or a data constructor. The argument will only be evaluated when it’s actually “used”.
Here I deliberately don’t define what I mean by “used”, because it’ll complicate the discussion a lot.
Now, with -XStrict
, we have a bang pattern in every binder. This means that:
Let-bindings are now strict. E.g. if we have something like:
let a = ... in <body>
a
is now evaluated before <body>
, and so we can be sure that it won’t be bottom in <body>
. Note however that this isn’t to say that a
can’t contain bottoms. Here I’m just saying that a
can’t be bottom in <body>
.
Function arguments are now strict. In a function like this:
= <body> f a b
a
and b
can’t be bottom in <body>
.
Data constructor arguments(fields) are now strict. If we have this:
data List a = Nil | Cons a (List a)
The two fields of Cons
are now strict. Now, this case may look a bit tricky at first. What we really say here is that once a List
is constructed, it can’t contain bottoms. We can still do something like undefined :: List Int
, but the program immediately fails, instead of running until you try to pattern match on that undefined
value like in the lazy case. This follows from the first two rules. Keep reading for more details.
When all these combined, it means that our programs are evaluated just like how they would be in a call-by-value language. For example, if we have:
let a = ... in <body>
We can be sure that a
won’t be bottom in <body>
, and it can’t contain bottoms too! This follows from all the rules above. The first rule only says that a
can’t be bottom itself. It doesn’t say anything about the fields(subexpressions) of a
. Third rule says that it can’t contain bottom fields.
When we make all the binders and fields strict, including all the modules in all the dependencies(base
etc.), we guarantee that we our programs evaluate like in a call-by-value language.
Now, call-by-value, call-by-name(and it’s efficient implementation call-by-need) etc. are really about how to evaluate a function application. In our case since we have a strictness annotation in all the function arguments, arguments will be evaluated before being passed to the function. Which really means evaluating the function application in call-by-value semantics.
Below are some questions and answers that I answered in the Reddit thread and in some follow-up threads.
Unless we compile modules that define those using -XStrict
, they’ll stay non-strict. For the standard types, we need base
compiled with -XStrict
. In practice this will probably never happen. But I think we can have another base, say, base-strict
, which is the same base
, except compiled with -XStrict
. In this case depending on which one we’re using our lists, tuples etc. become strict or lazy.
Monadic code is really not special in any sense. When talking about the semantics we should see through the syntactic sugar. Say we have this:
do a <- m1
<- m2 a
b return (a + b)
This is really just a syntactic sugar for:
>>= (\a -> m2 a >>= (\b -> return (a + b))) m1
With -XStrict
, function arguments a
and b
and all the arguments of >>=
and return
will be strict. This makes the whole code strict. In this code this means that m1
will be evaluated before m2 a
is evaluated, and return
s return value will be strict etc.
A list comprehension like [1..]
is again a syntactic sugar. It’s expanded form is enumFrom 1
. enumFrom
’s type is Enum a => a -> [a]
. Let’s say we’re using Enum Int
here. Since the instance is defined in base
, and lists are also defined in base
, this code will still work. However, if we compile base
with -XStrict
, this code loops because the standard list type will become strict.
In practice we would probably define strict and lazy lists separately to have the laziness when we need.
Since we don’t distinguish strict and lazy functions in type level, when we have a higher-order functions it may seem like we’d loose the guarantees. But this is not the case, at least not in general. Suppose we have this:
map :: (a -> b) -> [a] -> [b]
map _ [] = []
map f (x : xs) = f x : map f xs
This is compiled with -XStrict
. Now suppose we pass a function f
which is lazy in it’s argument. Since (:)
is strict in this code, we’ll still evaluate the f x
before returning. Our guarantee that the list won’t be bottom and won’t have bottom still holds.
See also the paper “Types are calling conventions”.
This is true. Even if we have this strict type:
data List a = Nil | Cons !a !(List a)
We can construct bottom values of this type, using, for example, undefined :: List Int
or (let x = x in x) :: Blah
.
However, if you think about how this value will be evaluated you’ll realize that this is exactly like how it would be evaluated in a call-by-value language. For example, if we try to bind it to a value:
let a = undefined :: List Int in <body>
This code will fail before <body>
is run. In <body>
a
can’t be bottom and can’t contain bottoms.
(See also this ghc-devs thread about adding user defined unlifted data types to GHC. With this we could eliminate all the bottoms in some user-defined types.)
This is exactly right. -XStrict
is really a very simple compiler pass that adds strictness annotations to every binder and field. We don’t have any changes in the GHC RTS to take advantage of additional strictness.
In other words, operational semantics of the language and implementation of this operational semantics in GHC RTS is still the same. We just do a program transformation to generate a program that evaluates like it would in a call-by-value language.
This means that if we have this program:
let a = undefined :: List Int in <body>
A thunk is still constructed in the runtime, but it’s evaluated before the <body>
is evaluated. So this code fails before <body>
is evaluated, and if we evaluate <body>
it means that a
is not bottom and doesn’t have bottoms.
This is another potential improvement over the -XStrict
. For more details and some optimizations see this ghc-devs thread.