July 31, 2012 - Tagged as: haskell, en.
GHC is a huge compiler. On GHC 7.4.1, I can see 85 optional language extensions1 Some of them are just adding some syntactic sugar(NamedFieldPuns
, RecordWildCards
), and some of them are extensions to type system(GADTs
, Rank2Types
). As a new Haskeller, I only know and use a small subset of this features, and in this post I’ll write about it.
This is some really simple syntactic sugar for pattern matching some names in records. Normally you’d do:
someFun :: T -> (Int, Int)
T{n2=n2, n3=n3} = (n2, n3) someFun
But instead of assigning some names to members of a record with same names, with help of NamedFieldPuns
, you could just write:
someFun :: T -> (Int, Int)
T{n3, n2} = (n2, n3) someFun
It’s that simple. RecordWildCards
doing something similar. Instead of writing someFun T{n1,n2,n3}
, you just write someFun T{..}
.
These are purely syntactic, so they can be used for any place that you can use pattern matching or destructuring(ie. let .. in ..
or where
clauses). Really helpful when working on data constructors that have 4+ members.
Most interesting use of RecordWildCards
I’ve seen was this implementation of localised module imports(I think this implementation can also be used for first-class modules).
This really helps when working on Text
data. When you are passing a function a Text
data, instead of using pack "some string"
, you can just use "some string"
and compiler resolves the type of the string, adds required code etc.
Basically it provides a way to create instances of class IsString a where fromString :: String -> a
using string syntax. It’s mostly used with Data.Text
, ByteString.Text
etc.
This is by far my favorite type system extension. This is mostly because I’m working on an interpreter, and GADTs are great for representing syntax trees.
To understand advantages of GADTs, first let’s think about how we can represent syntax tree for a simple language. Let some part of our sytax be(in a BNF-like format):
<var> = String
<abst> = "(" "λ" "(" {<var> [" "]}* ")" <term> ")"
<app> = "(" <term> {<term> [" "]}* ")"
<term> = <var> | <abst> | <app>
So, variables, lambda abstractions, and applications. Just like in untyped lambda calculus. Now let’s try to encode this format in Haskell:
data Term = Var String
| Abst [??] Term
| App Term Term
What should we write in place of ??
? Writing Term
is obviously wrong, since then we could be ill-formed data(for instance, think about a syntax like that: (λ ((λ (a) ..)) ...)
we used lambda abstraction in place of lambda parameter names).
If the term var
just consists of a String, then we can directly use String as a first member of Abst
constructor, but think term var
as a really complex constructor. I’m trying to make examples as clear and simple as possible.
One way to fix that is to use “smart constructors”. What this really means is that we can hide some data constructors in our module(ie. write exported names in module explicitly) and export some constructor function in place of them. With the help of a trick we can give users a more controlled way of creating data(in our case this means that we can prevent caller from creating ill-formed syntax tree).
First, let’s see the data type trick:
data T1
data T2
data SomeData a = Data1 String | Data2 Int
deriving Show
This is interesting because data types T1 and T2 don’t have any constructors. So there is literally no way to create data in that types. Second interesting point in SomeData a
is, the a
variable is not used by data constructors(Data1
and Data2
). This code still works:
ghci> :t Data1 "ok"
Data1 "ok" :: SomeData a
ghci> :t Data2 123
Data2 123 :: SomeData a
ghci> Data1 "ok" :: SomeData Int
Data1 "ok"
ghci> Data1 "ok" :: SomeData Char
Data1 "ok"
If we could find a way to distinguish types of data created with Data1
and Data2
, we’re done. This is where smart constructors take place:
data1 :: String -> SomeData T1
= Data1
data1 data2 :: Int -> SomeData T2
= Data2 data2
We’re telling to compiler that data created with data1
constructor will be the type of SomeData T1
. Now if we hide Data1
and Data2
data constructors and export data1
and data2
functions, there is no way for user create ill-formed data types like Data1 "ok" :: SomeData SomeUnrelatedType
like we done in past example.
ghci> data1 "ok" :: SomeData Int
<interactive>:48:1:
Couldn't match expected type `Int' with actual type `T1'
Expected type: SomeData Int
Actual type: SomeData T1
In the return type of a call of `data1'
In the expression: data1 "ok" :: SomeData Int
Now we can encode our syntax in Haskell like this:
data Var
data Abst
data App
data AnyTerm
data Term a = Var String
| Abst [Term Var] (Term AnyTerm)
| App (Term AnyTerm) (Term AnyTerm)
var :: String -> Term Var
= Var
var abst :: [Term Var] -> Term AnyTerm -> Term Abst
= Abst
abst app :: Term AnyTerm -> Term AnyTerm -> Term App
= App
app = id anyTerm
Now I’m omitting some parts without explaining since this post is already long enough. But the point should be clear by now. Let’s see this in action:
ghci> abst [(abst [var "p1"] (anyTerm (var "p1")))] (anyTerm (var "somevar"))
<interactive>:63:8:
Couldn't match expected type `Var' with actual type `Abst'
Expected type: Term Var
Actual type: Term Abst
In the return type of a call of `abst'
In the expression: (abst [var "p1"] (anyTerm (var "p1")))
Nice! Just like what we wanted. GADTs help for creating smart “data constructors” so that we don’t have to hide data constructors and export some smart constructor functions. Here’s the same code with GADTs:
data Var
data Abst
data App
data Term a where
Var :: String -> Term Var
Abst :: [Term Var] -> AnyTerm -> Term Abst
App :: AnyTerm -> AnyTerm -> Term App
data AnyTerm where
AnyTerm ::Term a -> AnyTerm
This code should be clear. All we do is to specify constructor’s return type.
This is really great improvement. Think of a more complex syntax tree, with a lot more constructors. The parser already checks for invalid syntax, so actually there’s no way to create ill-formed syntax tree since parser rejects the code.
If you don’t use GADTs or phantom types, you have to check for ill-formed syntax in your evaluator(or reducer or compiler or whatever) even if there’s no way for parser to generate ill-formed syntax tree. For instance, you’d have to check for Var [(App ..)] ...
in your eval :: Env -> Term -> Val
function.
The worst part of GADTs is that you can’t use deriving
clause in data type declaration(can anyone explain why?). In my case, that means I have to write some big amounts of code just to be able to print out the data for debugging purposes. For instance, I had to write some repetitive code like this:
instance Show (Expr a) where
...
show (Lambda params ret body) = "(lambda (" ++ unwords (map show params) ++ ") : " ++ show ret ++ " " ++ unwords (map show body) ++ ")"
show (If ifE thenE elseE) = "(" ++ intercalate "," [show ifE, show thenE, show elseE] ++ ")"
-- same repetitive code for each data constructor, it's really pain when the syntax grows bigger and when you frequently make some changes on it.
I also had to write same amount of code just to be able to run tests on my parser(ie. I need to compare hand-written syntax tree with parser generated syntax tree).
instance Eq (Expr a) where
...
Lambda p1 b1 == Lambda p2 b2 = p1 == p2 && b1 == b2
If i1 t1 e1 == If i2 t2 e2 = i1 == i2 && t1 == t2 && e1 == e2
...
== _ = False _
To list all extensions, I run GHCi, type :set -X
and then run auto-complete, it says “Display all 167 possibilities?” on GHC 7.4.1, 82 of them are starting with -XNo
.↩︎