osa1

Some GHC extensions I use

July 31, 2012 - Tagged as: haskell, en.

GHC is a huge compiler. On GHC 7.4.1, I can see 85 optional language extensions¹ Some of them are just adding some syntactic sugar(NamedFieldPuns, RecordWildCards), and some of them are extensions to type system(GADTs, Rank2Types). As a new Haskeller, I only know and use a small subset of this features, and in this post I’ll write about it.

NamedFieldPuns and RecordWildCards

This is some really simple syntactic sugar for pattern matching some names in records. Normally you’d do:

someFun :: T -> (Int, Int)
someFun T{n2=n2, n3=n3} = (n2, n3)

But instead of assigning some names to members of a record with same names, with help of NamedFieldPuns, you could just write:

someFun :: T -> (Int, Int)
someFun T{n3, n2} = (n2, n3)

It’s that simple. RecordWildCards doing something similar. Instead of writing someFun T{n1,n2,n3}, you just write someFun T{..}.

These are purely syntactic, so they can be used for any place that you can use pattern matching or destructuring(ie. let .. in .. or where clauses). Really helpful when working on data constructors that have 4+ members.

Most interesting use of RecordWildCards I’ve seen was this implementation of localised module imports(I think this implementation can also be used for first-class modules).

OverloadedStrings

This really helps when working on Text data. When you are passing a function a Text data, instead of using pack "some string", you can just use "some string" and compiler resolves the type of the string, adds required code etc.

Basically it provides a way to create instances of class IsString a where fromString :: String -> a using string syntax. It’s mostly used with Data.Text, ByteString.Text etc.

GADTs

This is by far my favorite type system extension. This is mostly because I’m working on an interpreter, and GADTs are great for representing syntax trees.

To understand advantages of GADTs, first let’s think about how we can represent syntax tree for a simple language. Let some part of our sytax be(in a BNF-like format):

<var>   = String
<abst>  = "(" "λ" "(" {<var> [" "]}* ")" <term> ")"
<app>   = "(" <term> {<term> [" "]}* ")"
<term>  = <var> | <abst> | <app>

So, variables, lambda abstractions, and applications. Just like in untyped lambda calculus. Now let’s try to encode this format in Haskell:

data Term = Var String
          | Abst [??] Term
          | App Term Term

What should we write in place of ?? ? Writing Term is obviously wrong, since then we could be ill-formed data(for instance, think about a syntax like that: (λ ((λ (a) ..)) ...) we used lambda abstraction in place of lambda parameter names).

If the term var just consists of a String, then we can directly use String as a first member of Abst constructor, but think term var as a really complex constructor. I’m trying to make examples as clear and simple as possible.

One way to fix that is to use “smart constructors”. What this really means is that we can hide some data constructors in our module(ie. write exported names in module explicitly) and export some constructor function in place of them. With the help of a trick we can give users a more controlled way of creating data(in our case this means that we can prevent caller from creating ill-formed syntax tree).

First, let’s see the data type trick:

data T1
data T2
data SomeData a = Data1 String | Data2 Int
  deriving Show

This is interesting because data types T1 and T2 don’t have any constructors. So there is literally no way to create data in that types. Second interesting point in SomeData a is, the a variable is not used by data constructors(Data1 and Data2). This code still works:

ghci> :t Data1 "ok"
Data1 "ok" :: SomeData a
ghci> :t Data2 123
Data2 123 :: SomeData a
ghci> Data1 "ok" :: SomeData Int
Data1 "ok"
ghci> Data1 "ok" :: SomeData Char
Data1 "ok"

If we could find a way to distinguish types of data created with Data1 and Data2, we’re done. This is where smart constructors take place:

data1 :: String -> SomeData T1
data1 = Data1
data2 :: Int -> SomeData T2
data2 = Data2

We’re telling to compiler that data created with data1 constructor will be the type of SomeData T1. Now if we hide Data1 and Data2 data constructors and export data1 and data2 functions, there is no way for user create ill-formed data types like Data1 "ok" :: SomeData SomeUnrelatedType like we done in past example.

ghci> data1 "ok" :: SomeData Int

<interactive>:48:1:
    Couldn't match expected type `Int' with actual type `T1'
    Expected type: SomeData Int
      Actual type: SomeData T1
    In the return type of a call of `data1'
    In the expression: data1 "ok" :: SomeData Int

Now we can encode our syntax in Haskell like this:

data Var
data Abst
data App
data AnyTerm
data Term a = Var String
            | Abst [Term Var] (Term AnyTerm)
            | App (Term AnyTerm) (Term AnyTerm)
var :: String -> Term Var
var = Var
abst :: [Term Var] -> Term AnyTerm -> Term Abst
abst = Abst
app :: Term AnyTerm -> Term AnyTerm -> Term App
app = App
anyTerm = id

Now I’m omitting some parts without explaining since this post is already long enough. But the point should be clear by now. Let’s see this in action:

ghci> abst [(abst [var "p1"] (anyTerm (var "p1")))] (anyTerm (var "somevar"))

<interactive>:63:8:
    Couldn't match expected type `Var' with actual type `Abst'
    Expected type: Term Var
      Actual type: Term Abst
    In the return type of a call of `abst'
    In the expression: (abst [var "p1"] (anyTerm (var "p1")))

Nice! Just like what we wanted. GADTs help for creating smart “data constructors” so that we don’t have to hide data constructors and export some smart constructor functions. Here’s the same code with GADTs:

data Var
data Abst
data App
data Term a where
    Var :: String -> Term Var
    Abst :: [Term Var] -> AnyTerm -> Term Abst
    App :: AnyTerm -> AnyTerm -> Term App
data AnyTerm where
    AnyTerm ::Term a -> AnyTerm

This code should be clear. All we do is to specify constructor’s return type.

This is really great improvement. Think of a more complex syntax tree, with a lot more constructors. The parser already checks for invalid syntax, so actually there’s no way to create ill-formed syntax tree since parser rejects the code.

If you don’t use GADTs or phantom types, you have to check for ill-formed syntax in your evaluator(or reducer or compiler or whatever) even if there’s no way for parser to generate ill-formed syntax tree. For instance, you’d have to check for Var [(App ..)] ... in your eval :: Env -> Term -> Val function.

The worst part of GADTs is that you can’t use deriving clause in data type declaration(can anyone explain why?). In my case, that means I have to write some big amounts of code just to be able to print out the data for debugging purposes. For instance, I had to write some repetitive code like this:

instance Show (Expr a) where
    ...
    show (Lambda params ret body) = "(lambda (" ++ unwords (map show params) ++ ") : " ++ show ret ++ " " ++ unwords (map show body) ++ ")"
    show (If ifE thenE elseE) = "(" ++ intercalate "," [show ifE, show thenE, show elseE] ++ ")"
    -- same repetitive code for each data constructor, it's really pain when the syntax grows bigger and when you frequently make some changes on it.

I also had to write same amount of code just to be able to run tests on my parser(ie. I need to compare hand-written syntax tree with parser generated syntax tree).

instance Eq (Expr a) where
    ...
    Lambda p1 b1 == Lambda p2 b2 = p1 == p2 && b1 == b2
    If i1 t1 e1 == If i2 t2 e2 = i1 == i2 && t1 == t2 && e1 == e2
    ...
    _ == _ = False

Some GHC extensions I use

NamedFieldPuns and RecordWildCards

OverloadedStrings

GADTs

Further reading: