osa1 github about atom

lcl -- Lua Container Library and The One Data Structure to Rule Them All

June 8, 2013 - Tagged as: en, lua.

Note: Sorry for organization of this post. We were talking about this stuff over #lua IRC channel for an hour, and before that I had worked on various bugs and I’m incredibly tired now.

I’ve been working with an experimental Lua library lately. It had started as a learning exercise, and after first iteration, it turned out to be an experiment about Lua’s internals, dynamic linking, and data structures.

Lua Container Lib(lcl or liblcl for short) is a Lua library to use C++ STL containers from within Lua. It’s compiled to a single .so(or .dll) and it can be loaded from Lua 5.1.5, 5.2.2, LuaJIT 2.0.2 and Love2D with package.loadlib standard function. It provides some STL containers with an object-oriented interface.

You can see the source here. There isn’t any tutorials yet, I think it’s simple enough to learn it from examples, see tests/ folder.

For now, it only contains set and deque containers, but it should be very easy to add more. Also, not all operations on sets and deques are supported yet. Again, this should also be very easy to add. Current code base should have all kinds of code to implement more containers/operations on containers by just looking for others’ implementations. Pull requests are welcome!

tests/ folder also have a simple benchmark. Output of benchmark is in files benchmark.output and benchmark.output_luajit. I think most people find the output interesting. Before explaining what’s going on in that benchmark, here are some stuff I learned while developing this library:

Lessons learned

Lua has some rules about the use of C API, but when you don’t follow the rules, you don’t immediately get caught. For instance, you should return the number of elements placed to the stack in your C functions. I had a bug in my code and one of my functions was returning 1 even though it returns with an empty stack. Nevertheless, the library worked fine until I tried it with LuaJIT.

LuaJIT is more picky about that rules and it fails in strange ways. Sometimes my program was failing with strange memory allocation errors, but program was still running. In the best case, I was getting a segmentation fault.

After several hours of debugging and some help from mailing list(see my mail here) I could solve it. Best helper was the LUA_USE_APICHECK debug macro. It’s added to Lua in somewhere between Lua 5.1.5 and 5.2.2. When you compile Lua with LUA_USE_APICHECK defined, Lua makes some assertions in code to make sure stack is in correct state. You should always develop C libraries to Lua with this enabled.

Now, as for benchmarks; two things can be seen immediately from benchmarks: 1) All operations on STL containers are slower 2) LuaJIT is awesome.

Let’s first start from second point. LuaJIT is awesome. It’s best thing happened to Lua. It’s binary compatible with Lua 5.1.5, which means with minimal effort, you can gain some real performance benefits. You can see the difference by comparing benchmark.output and benchmark.output_luajit files. All I had to do was to run program with LuaJIT instead of Lua.

Now, as for STL containers .. Before starting this project, I was considering having a better performance for specialized data structures, ie. STL deques should be faster than Lua tables used as deques. As can be seen from benchmarks, that’s not the case. I made a simplest possible deque implementation possible in Lua(you can see it here) and it’s still faster than STL deque(with minor difference).

I think there are several reasons for that.

There is no way to get a Lua value out of Lua interpreter. Lua C API deliberately avoid this because this may lead to memory leaks or memory corruptions.

You can only have a reference to a Lua value, and in that case, that value has to be written in some table. luaL_ref creates a reference and writes it to a Lua table, and then return that reference(as an int). Generally, you would use global register at LUA_REGISTRYINDEX to save Lua values.

This implies that you cannot have a container with insertion faster than Lua table insertion. Because every insertion also have to insert to a Lua table. This table is generally the global register at index LUA_REGISTRYINDEX. For example, when I add 1000000 elements to a set, all those elements is also added to the register.

You can see that STL deque insertions time is almost the same as Lua implementation’s. The reason for this is that even though STL deque insertion is O(1) with a minimal constant factor, you have to insert to a Lua table like explained above.

In case of set data structure: STL’s set implementation is generally a kind of tree, and elements are stored in sorted order(this makes possible to use STL sets as heaps like I did in dynload.lua example). This causes extra O(log N) function calls for comparisons, where N is number of elements in tree. ie. when a new element added, it’s place is determined by comparing it with elements at each level and then moving down to next level in tree. In case of Lua tables, all insertions are amortized O(1) and no comparison functions are called.

These are my explanations to reasons of why STL operations are slower.

Still, I don’t think adding C/C++ containers in Lua is completely pointless. Significant memory savings may be possible with C/C++ containers. For example, 32 flags can be held in a 32bit integer in C/C++, but to do this in Lua, you need to use a double for every flag, and a table. A double is 8 bytes in my 64bit machine. And with 32 flags it costs you 32*8 = 256 bytes. In C/C++ you can have it with only 4 bytes.

There may be also performance advantages for really complex algorithms, but I don’t have a particular example in mind.