# On matching bang patterns

June 27, 2016 - Tagged as: en, haskell.

I thought a bang pattern is all about `seq`. That may actually be true, but when that `seq` is happening may not be obvious. Even after ~5 years of Haskell I was initially very confused by this, and in fact at first I thought it was a bug:

``````data T = A | B | C

fn5 :: Int -> [T] -> Int
fn5  i []       = i
fn5  i (A : ts) = fn5 (i + 1) ts
fn5 !i (B : ts) = fn5 (i + 2) ts
fn5  i (C : ts) = fn5 0 ts``````

The question is, given these definitions, what does this evaluate to, and why:

``fn5 undefined [C]``

This question is basically a `BangPatterns` question. The key point is that a bang pattern first evaluates the value to match, then looks at the pattern. This is from GHC 8.0.1 user manual, section 9.28.1:

Matching an expression e against a pattern !p is done by first evaluating e (to WHNF) and then matching the result against p.

My initial thought was that this example would not crash because the pattern `i` always matches, and since second argument is only matched by last case of this definition, which doesn’t evaluate `i`, `i` would not get evaluated.

Or in other words, I thought all this bang pattern does is to add a `seq`, to the RHS:

``fn5 i (B : ts) = i `seq` fn5 (i + 2) ts``

which is not what it really does!

Before bang patterns, I think this pattern was considered as the standard way of forcing a function argument (mostly used for accumulator arguments):

``````f acc _ | acc `seq` False = undefined
f acc arg = ...``````

The guard in first equation always fails, but it forces the `acc` by the time it fails. While this looks bad1, it compiles to nice code after a case-of-case transformation, and it evaluates `acc` as first thing to do whenever it’s applied to two arguments.

Now, with `BangPatterns`, we get to do this:

``f !acc arg = ...``

Which is good when we have one equation only, but when we have multiple equations like in `fn5` above, we need add bang patterns to every equation, or we risk having bugs2.

So in short, we don’t have a good solution for saying a function is strict on some arguments, without risking bugs (by trying to add minimum number of bangs) or adding a lot of them.

1. I don’t like seeing `undefined`s like this, because I reserve them for code that’s not yet implemented but needs to be implemented. Using `undefined` as a proxy is also not OK these days, as we have visible type applications in GHC 8.0.1 and `Proxy` in base since `base-4.7`.

2. I don’t mean semantic bugs, rather, unexpected memory usage etc. caused by not forcing thunks on time.