osa1 feed

How I solved the Synacor Challenge

June 19, 2016 - Tagged as: en, rust.

It took 4 attempts at various coffee shops in Cambridge/UK (where I’m spending this summer – more on this later) but I finally solved the Synacor Challenge. Here’s how I did it.

WARNING: This post spoils everything! Do not read further if you’re interested in solving it yourself. It’s a lot of fun, so I highly recommend that.

First attempt

Initial VM implementation took about an hour. It worked well until I was eaten by a grue. At that point I realized that I need a save/load system.

The game looks like a classic text-based adventure where you interact by typing commands. So I thought saving the commands should be enough for “save game”. To load, VM would just load the input from the save file to its input buffer (which is used when in instruction is executed). Indeed it works nicely.

The game is very easy until you make it to the “office”. IIRC, you collect 6 out of 8 codes until that point. At the office you learn about the teleporter, which is the first real challenge in the game…

Second attempt

At this point I implemented about 20 debugger commands, for things like stepping, breaking at an instruction, breaking at an address, breaking on a specific register read, setting the instruction pointer etc. Just by using these commands I was able to teleport to the right place. However, the code I found there did not really work. It turns out the code is generated dynamically, based on the value in R8. Which is kind of expected, otherwise we could find all the codes just by looking at strings in the binary.

So at this point I realize I need a disassembler…

Third attempt

The disassembler was trivial to implement. I implement it as a part of VM because 1) in theory the program can generate code in runtime (although I don’t think this is the case in practice) 2) code and data is in the same address space and instruction boundaries are not clearly known. I guess I could do something like: Start with address 0, at each jump disassemble the jump target etc. but I’m not sure if that works as some jump targets are generated dynamically.

Anyway, the debugger has a disassembler now, and I start disassembling functions.

When I step instruction by instruction after using the teleporter, I see that it’s checking R8, if it’s not 0, then calling a function which is the “confimation process” that’s supposed to take 1 billion years. The function has very complex control flow so I try to avoid actually debugging it.

I look at the code that this function returns to. It’s checking if R1 (which has the return value) is 6. So I think, why not just set R1 6 and return from the function? Indeed, it works, but not really how I expected. I already knew that the code I’m searching is generated dynamically, but it’s actually generated using R8, and only when R1 is 6. So as it turns out, I need to guess a value of R8 that makes the validation function return 6. Just making the function return 6 doesn’t really work.

However, I said “it kinda worked”. Because the teleporter actually teleports me to the right place. It’s just that the generated code is not valid.

Fourth attempt

Before disassembling the verification function, I decide to solve the rest of the challenge. I realize that we’re now in a maze, 4x4. Each tile in the maze has either a number, or an operation (+, -, *). There’s an orb in the south-west corner, and there’s a door in the north-east corner. “30” is written on the door, and “22” is printed on the Orb. It’s easy to see what’s going on. Two things to realize are that every time we return to the first tile things get reset, and the goal tile can be visited only once (the orb disappears on visit).

I implement a program that does breadth-first search on this state space, see maze.rs. It then prints directions. When we follow those directions we find the final code and the challenge is completed.

However, since I by-passed the previous challenge, I need to solve that now.

This is the most fun part, it involves lots of debugging, and some programming. Here’s the disassembly of the verification function:

                                         -- fn(reg0 = 4, reg1 = 1, reg7 = our value) {
[6027] Jt [Reg(0), Num(6035)]            --   if reg0 == 0 {
[6030] Add [Reg(0), Reg(1), Num(1)]      --     reg0 = reg1 + 1;
[6034] Ret []                            --     return; }
[6035] Jt [Reg(1), Num(6048)]            --   if reg1 == 0 {
[6038] Add [Reg(0), Reg(0), Num(32767)]  --     reg0 -= 1;
[6042] Set [Reg(1), Reg(7)]              --     reg1 = reg7;
[6045] Call [Num(6027)] ------ loop      --     fn();
[6047] Ret []                            --     return; }
[6048] Push [Reg(0)]                     --   push(reg0);
[6050] Add [Reg(1), Reg(1), Num(32767)]  --   reg1 -= 1;
[6054] Call [Num(6027)] ------ loop      --   fn();
[6056] Set [Reg(1), Reg(0)]              --   reg1 = reg0;
[6059] Pop [Reg(0)]                      --   reg0 = pop();
[6061] Add [Reg(0), Reg(0), Num(32767)]  --   reg0 -= 1;
[6065] Call [Num(6027)] ------ loop      --   fn();
[6067] Ret [] -- end of function at 6027 --   return;
                                         -- }

I added next to each instruction how they would look like in a C-like language. It’s hard to understand what’s going on. So to experiment with it I implement it in Rust. Since registers are shared in each call, I use shared state in my initial implementation:

#[inline]
fn add(i1 : u16, i2 : u16) -> u16 {
    (i1 + i2) % 32768
}

#[derive(Debug)]
struct FnState {
    reg0  : u16,
    reg1  : u16,
    reg7  : u16,
    stack : Vec<u16>,
}

impl FnState {
    fn init(reg0 : u16, reg1 : u16, reg7 : u16) -> FnState {
        FnState {
            reg0: reg0,
            reg1: reg1,
            reg7: reg7,
            stack: Vec::new(),
        }
    }

    fn f(&mut self) {
        if self.reg0 == 0 {
            self.reg0 = add(self.reg1, 1);
            return;
        }
        if self.reg1 == 0 {
            // self.reg0 = add(self.reg0, 32767);
            self.reg0 -= 1;
            self.reg1 = self.reg7;
            self.f();
            return;
        }
        self.stack.push(self.reg0);
        // self.reg1 = add(self.reg1, 32767);
        self.reg1 -= 1;
        self.f();
        self.reg1 = self.reg0;
        self.reg0 = self.stack.pop().unwrap();
        // self.reg0 = add(self.reg0, 32767);
        self.reg0 -= 1;
        self.f();
        return;
    }
}

Now, this function grows really fast. Even for very small inputs it takes minutes to compute. I try to think some well-known functions that grow very fast. Ackermann comes to my mind and I check the Wiki page. Indeed, this looks quite similar, but the third argument makes it different than Ackermann. In any case, it doesn’t really matter for the solution.

So the problem is coming up with a reg7 in this code so that f(4, 1, reg7) returns 6. For that I need to implement a search but this is basically impossible with this slow function. I start simplifying the function a little bit. My first attempt:

impl FnState {
    fn f(&mut self) {
        // When reg0 hits zero, restart it from reg1 + 1
        if self.reg0 == 0 {
            self.reg0 = add(self.reg1, 1);
            return;
        }

        // When reg1 hits zero, decrement reg0, restart reg1 from reg7
        if self.reg1 == 0 {
            self.reg0 -= 1;
            self.reg1 = self.reg7;
            self.f();
            return;
        }

        let save_reg0 = self.reg0;

        self.reg1 -= 1;
        self.f();
        self.reg1 = self.reg0;

        self.reg0 = save_reg0;
        self.reg0 -= 1;

        self.f();
        return;
    }
}

This version doesn’t use an explicit stack, instead uses a temporary in the call frame. This works because in the original version each push corresponds to a pop done in the same call frame.

It’s still too complicated. I keep simplifying.

fn f(mut reg0 : u16, mut reg1 : u16, mut reg7 : u16) -> (u16, u16) {
    if reg0 == 0 {
        reg0 = add(reg1, 1);
        return (reg0, reg1);
    }

    if reg1 == 0 {
        reg0 -= 1;
        reg1 = reg7;
        return f(reg0, reg1, reg7);
    }

    let save_reg0 = reg0;
    reg1 -= 1;

    let (reg0_, reg1_) = f(reg0, reg1, reg7);
    reg0 = reg0_;
    reg1 = reg1_;

    reg1 = reg0;

    reg0 = save_reg0;
    reg0 -= 1;

    return f(reg0, reg1, reg7);
}

This version doesn’t have any shared mutable state. At this point I realize that it may be possible to remove internal mutable state too:

fn f(reg0 : u16, reg1 : u16, reg7 : u16) -> (u16, u16) {
    if reg0 == 0 {
        return (add(reg1, 1), reg1);
    }

    if reg1 == 0 {
        return f(reg0 - 1, reg7, reg7);
    }

    let (reg1, _) = f(reg0, reg1 - 1, reg7);

    return f(reg0 - 1, reg1, reg7);
}

Now, this is a function I can read and understand. One advantage of this version is that this could be easily memoized:

fn f_memo(reg0 : u16, reg1 : u16, reg7 : u16, memo : &mut HashMap<(u16, u16), (u16, u16)>) -> (u16, u16) {
    if let Some(ret) = memo.get(&(reg0, reg1)) {
        return *ret;
    }

    if reg0 == 0 {
        let ret = (add(reg1, 1), reg1);
        memo.insert((reg0, reg1), ret);
        return ret;
    }

    if reg1 == 0 {
        let ret = f_memo(reg0 - 1, reg7, reg7, memo);
        memo.insert((reg0, reg1), ret);
        return ret;
    }

    // careful there
    let (reg1_new, _) = f_memo(reg0, reg1 - 1, reg7, memo);

    let ret = f_memo(reg0 - 1, reg1_new, reg7, memo);
    memo.insert((reg0, reg1), ret);
    return ret;
}

This version is super fast when compared to the original version. I feel like I can just search the whole space in a few hours. I start the search and as it searches through the search space I start wondering about how to further improve it. I think, why not split search space into pieces and search in parallel?

fn search(start_range : u16, end_range : u16) -> Option<u16> {
    for i in start_range .. end_range {
        // println!("i = {}", i);
        let mut tbl = HashMap::new();
        let ret = f_memo(4, 1, i, &mut tbl);
        if ret.0 == 6 {
            println!("Found an answer: {:?} {}", ret, i);
            return Some(i);
        }
    }

    None
}

fn search_in_parallel(n : i32) {
    let increment = u16::MAX / (n as u16);
    let mut threads = Vec::with_capacity(n as usize);

    for i in 0 .. n {
        let range_start = increment * (i as u16);
        let range_end   = increment * ((i as u16) + 1) + 1;
        threads.push(thread::Builder::new().stack_size(1000000000).spawn(move || {
            search(range_start, range_end)
        }).unwrap());
    }

    for thread in threads {
        thread.join();
    }
}

fn main() {
    search_in_parallel(8);
}

This parallel search takes a couple of seconds until it prints:

Found an answer: (6, 5) 25734

So 25734 returns 6! This is the R8 value we were looking for.

I modify my VM to return when this function is called (I know it lives in address 6027 so I just check instruction pointer in the main loop) and drop to debugger prompt. In the debugger, I set R1 (return value register) 6, and set R8 25734, and continue running the program.

It works perfectly, with a working code printed to the screen!

Code and conclusion

Overall I enjoyed this a lot. My favorite part was definitely debugging the verification function. I don’t really enjoy text adventures but that’s really a very small part of the challenge, so it wasn’t a big deal.

My code is on Github. I didn’t organize the code at all, so you can see my inline notes, logs, and commented out code with their improved/changed versions, and have a feeling of how I developed my solutions. In the repo you’ll also see the original binary and challenge spec. I pushed those in case the original challenge page disappears in the future.

The Rust compiler I used was rustc 1.11.0-nightly (0554abac6 2016-06-10).

If you know similar challenges let me know in the comment section below. One challenge that looks similar is ICFP’06 programming contest “The Cult of the Bound Variable”, which I always wanted to solve but never really got a chance. Maybe I should try it next.