February 27, 2020 - Tagged as: en, haskell, ghc.
In the previous post we’ve looked at a representation of expressions in a programming language, what the representation makes easy and where we have to use knot-tying.
In this post I’m going to give two more examples, using the same expression representation from the previous post, and then talk about how to implement our passes using a different representation, without knot-tying.
Previously we attached arity and unfolding information to Id
s. Now suppose that our language is typed, and up to some point our transformations rely on typing information. Similar to arity and unfolding fields we add one more field to Id
:
data Id = Id
..
{ idType :: Maybe Type
, }
The Maybe
part is because when we no longer need the types we want to be able to clear the type fields to make the AST smaller. While we have only one heap object per Id
, in an average program there’s still a lot of different Id
s, and Type
representation can get quite large, so this is worthwhile. This makes the working set smaller, which causes less GC work and improves compiler performance.
In our cyclic AST representation the only way to implement this without losing sharing is with a full-pass over the entire program, using knot-tying. The code is similar to the ones in the previous post.
Remember that in the previous post we represented the AST as:
data Expr
= IdE Id
| IntE Int
| Lam Id Expr
| App Expr Expr
| IfE Expr Expr Expr
| Let Id Expr Expr
data Id = Id
idName :: String
{-- ^ Unique name of the identifier
idArity :: Int
,-- ^ Arity of a lambda. 0 for non-lambdas.
idUnfolding :: Maybe Expr
,-- ^ RHS of a binder, used for inlining
}
In this representation if I have a recursive definition like
let fac = \x . if x then x * fac (x - 1) else 1 in fac 5
In fac
used in lambda body I want to be able to do idUnfolding
and get the definition of this lambda. So the lambda refers to the Id
for fac
, and fac
refers to the lambda in its idUnfolding
field, forming a cycle.
In this representation only way to implement this is with knot-tying. An implementation that maintains a map from binders to their RHSs to update unfoldings of Id
s in occurrence position does not work, because when we update an occurrence of the binder in its own RHS (i.e. in a recursive let
) we end up invalidating the RHS
that we’ve added to the map.
Here’s a knot-tying implementation that adds unfoldings (only the interesting bits):
addUnfoldings :: Expr -> Expr
= go M.empty
addUnfoldings where
go :: M.Map String Id -> Expr -> Expr
= case e of
go ids e
IdE id ->
IdE (fromMaybe id (M.lookup (idName id) ids))
Let bndr rhs body ->
let
= M.insert (idName bndr) bndr' ids
ids' = go ids' rhs
rhs' = bndr{ idUnfolding = Just rhs' }
bndr' in
Let bndr{ idUnfolding = Just rhs' } rhs' (go ids' body)
...
As before we tie the knot in let
case and use it in Id
case.
It’s also possible to initialize idUnfolding
fields when parsing, using monadic knot-tying (MonadFix). Full code is shown at the end of this post, but the interesting bit is when parsing let
s and Id
s:
parseLet :: Parser Expr
= do
parseLet <- string "let"
_ <- parseIdName
id_name <- char '='
_
id, rhs) <- mfix $ \ ~(id_, _rhs) -> do
(
modify (Map.insert id_name id_)<- parseExpr
rhs return (Id{ idName = id_name, idArity = 0, idUnfolding = Just rhs }, rhs)
<- string "in"
_ <- parseExpr
body return (Let id rhs body)
parseId' :: Parser Id
= do
parseId' <- parseIdName
name <- get
id_map let def = Id{ idName = name, idArity = 0, idUnfolding = Nothing }
return (fromMaybe def (Map.lookup name id_map))
The idea is very similar. When parsing a let
we add a thunk for the binder with correct unfolding to a map. The map is then used when parsing Id
s in the RHS and body of the let
.
A well-known way of associating information with identifiers in a compiler is by using a “symbol table”. Instead of adding information about Id
s directly in the Id
fields, we maintain a table (or multiple tables) that map Id
s to the relevant information. Here’s one way to do this in our language:
data Expr
= IdE String
...
data IdInfo = IdInfo
idArity :: Int
{-- ^ Arity of a lambda. 0 for non-lambdas.
idUnfolding :: Maybe Expr
,-- ^ RHS of a binder, used for inlining
}
type SymTbl = Map.Map String IdInfo
In this representation we have to refer to the table for idArity
or idUnfolding
. That’s slightly more work than the previous representation where we could simply use the fields of an Id
, but a lot of other things become much simpler and efficient.
Here’s dropUnusedBindings
in this representation (only the interesting bits, full code is at the end of this post):
dropUnusedBindings :: Expr -> State SymTbl Expr
=
dropUnusedBindings fmap snd . go Set.empty
where
go :: Set.Set String -> Expr -> State SymTbl (Set.Set String, Expr)
= case e0 of
go free_vars e0
Let bndr e1 e2 -> do
<- go free_vars e2
(free2, e2') if Set.member bndr free2 then do
<- go free_vars e1
(free1, e1')
setIdArity bndr (countLambdas e1')return (Set.delete bndr (Set.union free1 free2), Let bndr e1' e2')
else
return (free2, e2')
...
Our pass is now stateful (updates the symbol table) and written in monadic style. Knot-tying is gone. We update the symbol table after processing a let
RHS. Because Id
s no longer have the arity information we don’t need to update anything other than the symbol table.
It’s now trivial to implement addUnfoldings
:
addUnfoldings :: Expr -> State SymTbl ()
= case e0 of
addUnfoldings e0
IdE{} ->
return ()
IntE{} ->
return ()
Lam arg body ->
addUnfoldings body
App e1 e2 -> do
addUnfoldings e1
addUnfoldings e2
IfE e1 e2 e3 -> do
addUnfoldings e1
addUnfoldings e2
addUnfoldings e3
Let bndr e1 e2 -> do
addUnfoldings e1
addUnfoldings e2 setIdUnfolding bndr e1
Doing it during parsing is also trivial, and shown in the full code at the end of this post. Updating typing information when we no longer need them is simply
dropTypes :: State SymTbl ()
= modify (Map.map (\id_info -> id_info{ idType = Nothing })) dropTypes
We could also maintain a separate table for typing information, in which case all we had to do would be to stop using that table.
Easy!
Cyclic AST representation in a purely functional language necessitates knot-tying and relies on lazy evaluation. A well-known alternative is using symbol tables. It works across languages (does not rely on lazy evaluation) and keeps the code simple.
Cyclic representations make using the information easier, while symbol tables make updating easier. Code for updating the information is shown above and the previous post. For using the information, compare:
-- Get the information in a cyclic representation
... (idUnfolding id) ...
-- Get the information using a symbol table
arity <- getIdUnfolding id
To me the monadic version is not too bad in terms of verbosity or convenience, especially because Haskell makes state passing so easy.
Some of the problems with knot-tying is as explained at the end of the previous post. What I did not mention in the previous post is the problems with efficiency, which are demonstrated better in this post.
In the “typing information” example, with the cyclic representation I need to copy the entire AST to update every single Id
occurrence and binder. With the symbol table I need to update just the table, which is much smaller than the AST.
In the unfolding example, with the cyclic representation I again need to copy the entire AST or use MonadFix
if I’m doing it in parsing. With a symbol table the pass does not update the AST, only updates the table. If I’m doing it in parsing then I simply add an entry to the table after parsing a let
. (full code at the end of this post)
In use sites, getIdArity
(a map lookup) does more work than idArity
(just follows a pointer). While I don’t have any benchmarks on this, I doubt that this is bad enough to make cyclic representation and knot-tying preferable.
Examples in these two posts are inspired by GHC:
Id
s in an Id
field with type IdInfo
.IdInfo
type holds information like arity and unfolding.Id
has another field: varType
.IdInfo
s with code generator-generated information.In the first post I mostly argued that knot-tying makes things more complicated, and in this post I showed that knot-tying is necessary because of the cyclic representation. If we want to do the same without knot-tying we either have to introduce mutable references (e.g. IORef
s) in our AST (not shown in this post), or have to use a non-cyclic representation with symbol tables.
Between these two representations, I think non-cyclic representation with symbol tables is a better choice.
Full code (knot-tying)
-- Tried with GHC 8.6.4
{-# OPTIONS_GHC -Wall #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances #-}
import Data.List
import Data.Maybe
import Prelude hiding (id)
-- mtl-2.2
import Control.Monad.State
-- containers-0.6
import qualified Data.Map as Map
import qualified Data.Set as Set
-- megaparsec-7.0
import Text.Megaparsec hiding (State)
import Text.Megaparsec.Char
-- pretty-show-1.10
import Text.Show.Pretty
data Expr
= IdE Id
| IntE Int
| Lam Id Expr
| App Expr Expr
| IfE Expr Expr Expr
| Let Id Expr Expr
deriving (Show)
data Id = Id
idName :: String
{-- ^ Unique name of the identifier
idArity :: Int
,-- ^ Arity of a lambda. 0 for non-lambdas.
idUnfolding :: Maybe Expr
,-- ^ RHS of a binder, used for inlining
}
instance Show Id where
show (Id name arity _) = "(Id " ++ show name ++ " " ++ show arity ++ ")"
--------------------------------------------------------------------------------
-- Initializing unfolding fields in parse time via MonadFix
type IdMap = Map.Map String Id
type Parser = ParsecT String String (State IdMap)
parseExpr :: Parser Expr
= do
parseExpr <- some $
exprs $
choice map (\p -> p <* space)
[ parseParens, parseIf, parseLam, parseInt,
parseLet, try parseId ]return (foldl1' App exprs)
parseParens, parseIf, parseLam, parseInt, parseId :: Parser Expr
parseLet,
= do
parseParens <- char '('
_
space<- parseExpr
expr <- char ')'
_ return expr
= do
parseIf <- string "if"
_
space<- parseExpr
condE
<- string "then"
_
space<- parseExpr
thenE <- string "else"
_
space<- parseExpr
elseE return (IfE condE thenE elseE)
= do
parseLam <- char '\\'
_
spaceid <- parseId'
space<- char '.'
_
space<- parseExpr
body return (Lam id body)
= do
parseInt <- some digitChar
chars return (IntE (read chars))
= do
parseLet <- string "let"
_
space<- parseIdName
id_name
space<- char '='
_
space
id, rhs) <- mfix $ \ ~(id_, _rhs) -> do
(
modify (Map.insert id_name id_)<- parseExpr
rhs return (Id{ idName = id_name, idArity = 0, idUnfolding = Just rhs }, rhs)
<- string "in"
_
space<- parseExpr
body return (Let id rhs body)
= IdE <$> parseId'
parseId
kws :: Set.Set String
= Set.fromList ["if", "then", "else", "let", "in"]
kws
parseIdName :: Parser String
= do
parseIdName <- some letterChar
name not (Set.member name kws))
guard (return name
parseId' :: Parser Id
= do
parseId' <- parseIdName
name <- get
id_map let def = Id{ idName = name, idArity = 0, idUnfolding = Nothing }
return (fromMaybe def (Map.lookup name id_map))
testPgm :: String -> Expr
=
testPgm pgm case evalState (runParserT parseExpr "" pgm) Map.empty of
Left (err_bundle :: ParseErrorBundle String String) ->
error (errorBundlePretty err_bundle)
Right expr ->
expr
instance ShowErrorComponent [Char] where
= x
showErrorComponent x
--------------------------------------------------------------------------------
-- Initializing unfoldings with knot-tying
addUnfoldings :: Expr -> Expr
= go Map.empty
addUnfoldings where
go :: Map.Map String Id -> Expr -> Expr
= case e of
go ids e
-- Interesting bits ------------------------------------------------------
IdE id ->
IdE (fromMaybe id (Map.lookup (idName id) ids))
Let bndr rhs body ->
let
= Map.insert (idName bndr) bndr' ids
ids' = go ids' rhs
rhs' = bndr{ idUnfolding = Just rhs' }
bndr' in
Let bndr{ idUnfolding = Just rhs' } rhs' (go ids' body)
--------------------------------------------------------------------------
IntE{} ->
e
Lam arg body ->
Lam arg (go ids body)
App e1 e2 ->
App (go ids e1) (go ids e2)
IfE e1 e2 e3 ->
IfE (go ids e1) (go ids e2) (go ids e3)
Full code (symbol table)
-- Tried with GHC 8.6.4
{-# OPTIONS_GHC -Wall #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE FlexibleInstances #-}
import Data.List
import Data.Maybe
import Prelude hiding (id)
-- mtl-2.2
import Control.Monad.State
-- containers-0.6
import qualified Data.Map as Map
import qualified Data.Set as Set
-- megaparsec-7.0
import Text.Megaparsec hiding (State)
import Text.Megaparsec.Char
-- pretty-show-1.10
import Text.Show.Pretty
import Debug.Trace
data Expr
= IdE String
| IntE Int
| Lam String Expr
| App Expr Expr
| IfE Expr Expr Expr
| Let String Expr Expr
deriving (Show)
data IdInfo = IdInfo
idArity :: Int
{-- ^ Arity of a lambda. 0 for non-lambdas.
idUnfolding :: Maybe Expr
,-- ^ RHS of a binder, used for inlining
idType :: Maybe Type
,-- ^ Type of the id.
}
data Type = Type -- Assume a large type
instance Show IdInfo where
show (IdInfo arity _ _) = "(IdInfo " ++ show arity ++ ")"
type SymTbl = Map.Map String IdInfo
getIdInfo :: String -> State SymTbl (Maybe IdInfo)
id =
getIdInfo id <$> get
Map.lookup
setIdArity :: String -> Int -> State SymTbl ()
id arity = modify (Map.alter alter id)
setIdArity where
Nothing =
alter Just IdInfo{ idArity = arity, idUnfolding = Nothing, idType = Nothing }
Just id_info) =
alter (Just id_info{ idArity = arity }
setIdUnfolding :: String -> Expr -> State SymTbl ()
id unfolding = modify (Map.alter alter id)
setIdUnfolding where
Nothing =
alter Just IdInfo{ idUnfolding = Just unfolding, idArity = 0, idType = Nothing }
Just id_info) =
alter (Just id_info{ idUnfolding = Just unfolding }
countLambdas :: Expr -> Int
Lam _ rhs) = 1 + countLambdas rhs
countLambdas (= 0
countLambdas _
dropUnusedBindings :: Expr -> State SymTbl Expr
=
dropUnusedBindings fmap snd . go Set.empty
where
go :: Set.Set String -> Expr -> State SymTbl (Set.Set String, Expr)
= case e0 of
go free_vars e0
IdE id ->
return (Set.insert id free_vars, e0)
IntE{} ->
return (free_vars, e0)
Lam arg body -> do
<- go free_vars body
(free_vars', body') return (Set.delete arg free_vars', Lam arg body')
App e1 e2 -> do
<- go free_vars e1
(free1, e1') <- go free_vars e2
(free2, e2') return (Set.union free1 free2, App e1' e2')
IfE e1 e2 e3 -> do
<- go free_vars e1
(free1, e1') <- go free_vars e2
(free2, e2') <- go free_vars e3
(free3, e3') return (Set.unions [free1, free2, free3], IfE e1' e2' e3')
Let bndr e1 e2 -> do
<- go free_vars e2
(free2, e2') if Set.member bndr free2 then do
<- go free_vars e1
(free1, e1') return ())
trace (ppShow e1') (
setIdArity bndr (countLambdas e1')return (Set.delete bndr (Set.union free1 free2), Let bndr e1' e2')
else
return (free2, e2')
addUnfoldings :: Expr -> State SymTbl ()
= case e0 of
addUnfoldings e0
IdE{} ->
return ()
IntE{} ->
return ()
Lam _ body ->
addUnfoldings body
App e1 e2 -> do
addUnfoldings e1
addUnfoldings e2
IfE e1 e2 e3 -> do
addUnfoldings e1
addUnfoldings e2
addUnfoldings e3
Let bndr e1 e2 -> do
addUnfoldings e1
addUnfoldings e2
setIdUnfolding bndr e1
dropTypes :: State SymTbl ()
= modify (Map.map (\id_info -> id_info{ idType = Nothing }))
dropTypes
pgm :: Expr
= Let "fac" rhs body
pgm where
= Lam "x" (IfE (IdE "x") (App (App (IdE "*") (IdE "x"))
rhs App (IdE "fac")
(App (App (IdE "-") (IdE "x")) (IntE 1))))
(IntE 1))
(= App (IdE "fac") (IntE 5)
body
--------------------------------------------------------------------------------
-- Initializing unfolding fields in parse time, the boring way
type Parser = ParsecT String String (State SymTbl)
parseExpr :: Parser Expr
= do
parseExpr <- some $
exprs $
choice map (\p -> p <* space)
[ parseParens, parseIf, parseLam, parseInt,
parseLet, try parseId ]return (foldl1' App exprs)
parseParens, parseIf, parseLam, parseInt, parseId :: Parser Expr
parseLet,
= do
parseParens <- char '('
_
space<- parseExpr
expr <- char ')'
_ return expr
= do
parseIf <- string "if"
_
space<- parseExpr
condE
<- string "then"
_
space<- parseExpr
thenE <- string "else"
_
space<- parseExpr
elseE return (IfE condE thenE elseE)
= do
parseLam <- char '\\'
_
spaceid <- parseId'
space<- char '.'
_
space<- parseExpr
body return (Lam id body)
= do
parseInt <- some digitChar
chars return (IntE (read chars))
= do
parseLet <- string "let"
_
spaceid <- parseId'
space<- char '='
_
space<- parseExpr
rhs <- string "in"
_
space<- parseExpr
body id rhs)
lift (setIdUnfolding return (Let id rhs body)
= IdE <$> parseId'
parseId
kws :: Set.Set String
= Set.fromList ["if", "then", "else", "let", "in"]
kws
parseId' :: Parser String
= do
parseId' <- some letterChar
name not (Set.member name kws))
guard (return name
testPgm :: String -> Expr
=
testPgm pgm case evalState (runParserT parseExpr "" pgm) Map.empty of
Left (err_bundle :: ParseErrorBundle String String) ->
error (errorBundlePretty err_bundle)
Right expr ->
expr
instance ShowErrorComponent [Char] where
= x showErrorComponent x