osa1

IORef and STRef under the hood

July 25, 2016 - Tagged as: en, haskell.

In this post we’ll take a look at internals of GHC’s mutable variables, and how they’re used by IORef and STRef. The code is copied from GHC, with minor changes for clarity.

λ> :m + Data.IORef
λ> :info IORef
newtype IORef a
  = GHC.IORef.IORef (GHC.STRef.STRef GHC.Prim.RealWorld a)
        -- Defined in ‘GHC.IORef’
instance Eq (IORef a) -- Defined in ‘GHC.IORef’

GHC.IORef is defined in libraries/base/GHC/IORef.hs:

-- | A mutable variable in the 'IO' monad
newtype IORef a = IORef (STRef RealWorld a)

We’ll look at 3 operations: read, write, and atomic modify.

-- | Read the value of an 'IORef'
readIORef   :: IORef a -> IO a
readIORef  (IORef var) = stToIO (readSTRef var)

-- | Write a new value into an 'IORef'
writeIORef  :: IORef a -> a -> IO ()
writeIORef (IORef var) v = stToIO (writeSTRef var v)

atomicModifyIORef :: IORef a -> (a -> (a,b)) -> IO b
atomicModifyIORef (IORef (STRef r#)) f = IO $ \s -> atomicModifyMutVar# r# f s

STRef is defined in libraries/base/GHC/STRef.hs:

-- | A value of type `STRef s a` is a mutable variable in state thread `s`,
-- containing a value of type `a`
data STRef s a = STRef (MutVar# s a)

-- | Read the value of an 'STRef'
readSTRef :: STRef s a -> ST s a
readSTRef (STRef var#) = ST $ \s1# -> readMutVar# var# s1#

-- | Write a new value into an 'STRef'
writeSTRef :: STRef s a -> a -> ST s ()
writeSTRef (STRef var#) val = ST $ \s1# ->
    case writeMutVar# var# val s1# of
      s2# -> (# s2#, () #)

Note that there’s no atomicModifySTRef, because that only makes sense in IO context. So atomicModifyIORef directly calls the primop.

In summary:

IORef: MutVar#, wrapped with STRef. When we unpack an IORef in data constructor fields, internally we store a MutVar#.
writeIORef, writeSTRef: writeMutVar#
readIORef, readSTRef: readMutVar#
atomicModifyIORef: atomicModifyMutVar#

readMutVar#

Primop definition:

primop  ReadMutVarOp "readMutVar#" GenPrimOp
   MutVar# s a -> State# s -> (# State# s, a #)
   {Read contents of {\tt MutVar\#}. Result is not yet evaluated.}
   with
   has_side_effects = True
   can_fail         = True

Code generation is handled by StgCmmPrim.emitPrimOp:

emitPrimOp :: DynFlags
           -> [LocalReg]        -- where to put the results
           -> PrimOp            -- the op
           -> [CmmExpr]         -- arguments
           -> FCode ()

...

emitPrimOp dflags [res] ReadMutVarOp [mutv]
   = emitAssign (CmmLocal res) (cmmLoadIndexW dflags mutv (fixedHdrSizeW dflags) (gcWord dflags))

This is just relative addressing, base is the MutVar# we’re reading, and we skip the closure header to read the contents.

writeMutVar#

Primop definition:

primop  WriteMutVarOp "writeMutVar#"  GenPrimOp
   MutVar# s a -> a -> State# s -> State# s
   {Write contents of {\tt MutVar\#}.}
   with
   has_side_effects = True
   code_size        = { primOpCodeSizeForeignCall }
                         -- for the write barrier
   can_fail         = True

Code generation is again implemented in emitPrimOp:

emitPrimOp dflags [] WriteMutVarOp [mutv,var]
   = do emitStore (cmmOffsetW dflags mutv (fixedHdrSizeW dflags)) var
        emitCCall [{-no results-}]
                  (CmmLit (CmmLabel mkDirty_MUT_VAR_Label))
                  [(CmmReg (CmmGlobal BaseReg), AddrHint), (mutv,AddrHint)]

This is more involved than readMutVar#. First, we write the variable to the MutVar# in the first line (emitStore). Then, we call a function, specified by the variable mkDirty_MUT_VAR_Label, passing two arguments: a global called BaseReg, and the MutVar#. mkDirty_MUT_VAR_Label just holds the name of this function: (defined in rts/sm/Storage.c)

/*
   This is the write barrier for MUT_VARs, a.k.a. IORefs.  A
   MUT_VAR_CLEAN object is not on the mutable list; a MUT_VAR_DIRTY
   is.  When written to, a MUT_VAR_CLEAN turns into a MUT_VAR_DIRTY
   and is put on the mutable list.
*/
void dirty_MUT_VAR(StgRegTable *reg, StgClosure *p)
{
    Capability *cap = regTableToCapability(reg);
    if (p->header.info == &stg_MUT_VAR_CLEAN_info) {
        p->header.info = &stg_MUT_VAR_DIRTY_info;
        recordClosureMutated(cap,p);
    }
}

Remember that for the first argument we passed something called BaseReg, and for the second argument we passed the MutVar#.

This function gets a Capability from the register table, and if the MutVar# is “clean”, it marks it as “dirty” and records in the capability that it’s now mutated.

Capability lacks documentation, but it’s not too important, so we just skip that and look at recordClsoureMutated.

void recordClosureMutated(Capability *cap, StgClosure *p)
{
    bdescr *bd = Bdescr((StgPtr)p);
    if (bd->gen_no != 0) recordMutableCap(p,cap,bd->gen_no);
}

p is our MutVar# here. bdescr stands for “block descriptor”. GHC RTS allocates memory in blocks, and every block belongs to a generation. First generation is special in that if a MutVar# is in the first generation, it can’t point to a younger generation, as the first generation is already the youngest generation. This is from includes/rts/storage/GC.h:

- generation 0 is the allocation area.  It is given a fixed set of blocks
  during initialisation, and these blocks normally stay in G0S0.  In parallel
  execution, each Capability has its own nursery.

This code basically checks if the MutVar# belongs to first generation (generation 0). If that’s not the case, we record the MutVar# in the generation’s “mut list”:

void recordMutableCap(const StgClosure *p, Capability *cap, uint32_t gen)
{
    bdescr* bd = cap->mut_lists[gen];
    if (bd->free >= bd->start + BLOCK_SIZE_W) {
        bdescr *new_bd = allocBlockOnNode_lock(cap->node);
        new_bd->link = bd;
        bd = new_bd;
        cap->mut_lists[gen] = bd;
    }
    *bd->free++ = (StgWord)p;
}

Garbage collector then checks that list when collecting younger generations, to avoid collecting young objects kept alive by older generations (i.e. pointers from older generations to younger generations, see scavenge_capability_mut_lists in rts/sm/Scav.c).

We saw in dirty_MUT_VAR that the MutVar# is marked as “dirty” when it’s mutated. When is it marked as “clean” again?

When a MutVar# is copied during GC, the object pointed by it is also copied to the same generation, and then the MutVar# becomes clean again, because it no longer points to a younger generation. This is the related code:

static void
scavenge_block(bdescr *bd)
{
    ...
    case MUT_VAR_DIRTY:
        gct->eager_promotion = rtsFalse;
        evacuate(&((StgMutVar *)p)->var);
        gct->eager_promotion = saved_eager_promotion;
        if (gct->failed_to_evac) {
            ((StgClosure *)q)->header.info = &stg_MUT_VAR_DIRTY_info;
        } else {
            ((StgClosure *)q)->header.info = &stg_MUT_VAR_CLEAN_info;
        }
        p += sizeofW(StgMutVar);
        break;
    ...
}

atomicModifyMutVar#

Primop definition:

primop  AtomicModifyMutVarOp "atomicModifyMutVar#" GenPrimOp
   MutVar# s a -> (a -> b) -> State# s -> (# State# s, c #)
   with
   out_of_line      = True
   has_side_effects = True
   can_fail         = True

out_of_line = True basically tells code generator that this primop is implemented as a function. Code generator then just generates a function call:

cgOpApp :: StgOp        -- The op
        -> [StgArg]     -- Arguments
        -> Type         -- Result type (always an unboxed tuple)
        -> FCode ReturnKind

...

cgOpApp (StgPrimOp primop) args res_ty = do
    dflags <- getDynFlags
    cmm_args <- getNonVoidArgAmodes args
    case shouldInlinePrimOp dflags primop cmm_args of
        Nothing -> do  -- out-of-line
          let fun = CmmLit (CmmLabel (mkRtsPrimOpLabel primop))
          emitCall (NativeNodeCall, NativeReturn) fun cmm_args
        ...

The primop is implemented in Cmm, in rts/PrimOps.cmm. The code is a mess, but here’s the important part:

stg_atomicModifyMutVarzh ( gcptr mv, gcptr f )
{
  ...
  retry:
    x = StgMutVar_var(mv);
    StgThunk_payload(z,1) = x;
#ifdef THREADED_RTS
    (h) = ccall cas(mv + SIZEOF_StgHeader + OFFSET_StgMutVar_var, x, y);
    if (h != x) { goto retry; }
#else
    StgMutVar_var(mv) = y;
#endif

    if (GET_INFO(mv) == stg_MUT_VAR_CLEAN_info) {
        ccall dirty_MUT_VAR(BaseReg "ptr", mv "ptr");
    }

    return (r);
}

It’s basically a compare-and-swap loop, and in the end it marks the MutVar# as “dirty”, using the same dirty_MUT_VAR function used by the code generated for writeMutVar#.

The `MutVar#` struct

As the last thing, we look at the definition of MutVar#: (in includes/rts/storage/Closures.h)

typedef struct {
    StgHeader   header;
    StgClosure *var;
} StgMutVar;

Nothing interesting here. See this Wiki page for GHC’s heap object layout. In our case, payload contains a single closure.

This concludes our MutVar# (which is used under the hood for IORef and STRef) tour. I guess lessons here are:

readIORef is fast, but writeIORef is one function call in the best case. In the worst case, it does an expensive allocation (this is not just a heap pointer bump). If you have a tight loop with some state variables, prefer parameter passing instead.
Unpacking an IORef in a data constructor field does not really make the constructor mutable. Instead, it inlines the MutVar#, which has a mutable pointer field.

If you think about it a little bit, you may realize that optimizing (2) is actually quite tricky. Imagine having something like this:

data D = D { f1 :: {-# UNPACK #-} !(IORef Int)
           , f2 :: Int
           }

bumpf1 :: D -> IO ()
bumpf1 (D f1 _) = modifyIORef f1 (+ 1)

bumpf2 :: D -> D
bumpf2 (D f1 f2) = D f1 (f2 + 1)

You’d expect this to print True:

do ref <- newIORef 0
   let d1 = D ref 0
       d2 = bumpD2 d1
   bumpf1 d1
   rv1 <- readIORef (f1 d1)
   rv2 <- readIORef (f1 d2)
   print (rv1 == rv2)

If D becomes mutable after the UNPACK, this code doesn’t work anymore, because we lose sharing after the functional update in line bumpD2 d1.

See also this discussion for how other compilers improve this.

IORef and STRef under the hood

readMutVar#

writeMutVar#

atomicModifyMutVar#

The MutVar# struct

The `MutVar#` struct