osa1 feed

On matching bang patterns

June 27, 2016 - Tagged as: en, haskell.

I thought a bang pattern is all about seq. That may actually be true, but when that seq is happening may not be obvious. Even after ~5 years of Haskell I was initially very confused by this, and in fact at first I thought it was a bug:

data T = A | B | C

fn5 :: Int -> [T] -> Int
fn5  i []       = i
fn5  i (A : ts) = fn5 (i + 1) ts
fn5 !i (B : ts) = fn5 (i + 2) ts
fn5  i (C : ts) = fn5 0 ts

The question is, given these definitions, what does this evaluate to, and why:

fn5 undefined [C]

This question is basically a BangPatterns question. The key point is that a bang pattern first evaluates the value to match, then looks at the pattern. This is from GHC 8.0.1 user manual, section 9.28.1:

Matching an expression e against a pattern !p is done by first evaluating e (to WHNF) and then matching the result against p.

My initial thought was that this example would not crash because the pattern i always matches, and since second argument is only matched by last case of this definition, which doesn’t evaluate i, i would not get evaluated.

Or in other words, I thought all this bang pattern does is to add a seq, to the RHS:

fn5 i (B : ts) = i `seq` fn5 (i + 2) ts

which is not what it really does!

Before bang patterns, I think this pattern was considered as the standard way of forcing a function argument (mostly used for accumulator arguments):

f acc _ | acc `seq` False = undefined
f acc arg = ...

The guard in first equation always fails, but it forces the acc by the time it fails. While this looks bad1, it compiles to nice code after a case-of-case transformation, and it evaluates acc as first thing to do whenever it’s applied to two arguments.

Now, with BangPatterns, we get to do this:

f !acc arg = ...

Which is good when we have one equation only, but when we have multiple equations like in fn5 above, we need add bang patterns to every equation, or we risk having bugs2.

So in short, we don’t have a good solution for saying a function is strict on some arguments, without risking bugs (by trying to add minimum number of bangs) or adding a lot of them.

  1. I don’t like seeing undefineds like this, because I reserve them for code that’s not yet implemented but needs to be implemented. Using undefined as a proxy is also not OK these days, as we have visible type applications in GHC 8.0.1 and Proxy in base since base-4.7.

  2. I don’t mean semantic bugs, rather, unexpected memory usage etc. caused by not forcing thunks on time.