Fast(er)(est) Table Inserts in LuaJIT

What’s a fast way to insert a large number of elements into a Lua table? What’s the fastest way? And what’s faster than that? There’s a lot of discussion and advice floating around regarding such a primitive topic, so it’s time to dig into some implementation details. We’ll look briefly at a few idiomatic approaches, and discuss the compilation, results, and drawbacks of each.

Quick background, as well as a few points of scope. Lua tables function as both arrays and hash tables; we’ll focus on the former only today, as the discussion keys on incremental inserts. We’re also going to stick to LuaJIT- being able to examine trace outputs will be useful in fleshing out an understanding of each insertion approach, and using newer primitives like and table.clear means we can reduce unnecessary performance noise in our analysis.

Consider an example in which we need to add 10^7-ish elements to a table. We’ll assume that space for the table has been pre-allocated, and for our purposes, the insertion elements don’t matter- what we care about is the management overhead and adding elements to the table. We’re essentially treating our table like a stack, so there are two options for pushing an element:

  • table.insert(t, a, i?): Insert an element a into table t at index i (if the index isn’t provided, the element is added to the end of the array- this is the behavior we’re going with).
  • t[#t + 1] = a: Assign the value a to the position #t + 1 in the table, where #t equals the length of the table. Note that “length” has a notable meaning.

I often see style and performance guides encouraging use of the latter, the justification being that the former incurs the overhead of a function call, as well as the providence of increased readability. While neither of these are false statements, I’ve heard (read) on multiple occasions that t[#t + 1] is significantly faster than table.insert, which always struck me as odd. Putting this to the test in a micro benchmark is easy:

Straightforward. Globals are cached in upvalues, microsecond precision is provided via FFI, table size is pre-created, standard stuff. There’s no statistically significant difference in performance between the two:

And in looking at the generated traces, there’s really no difference, particularly in the generated loop- there actually is no difference (remember that the loops are incredibly tight because we’re assigning the same static value in every iteration).

In both cases, lj_tab_len is called . This is where the vast majority of our runtime comes from, as the function runs in linear time. If we can remove this overhead, we can save a great deal of needless execution time. The idiomatic approach in OpenResty applications has been to use a separate counter, kept in the same lexical scope as the table, to obviate the need to lookup the table length. This approach is easy enough to implement in simple environments, and we can benchmark it as such:

This approach is orders of magnitude more performant than the linear lookup solutions:

So, 12 milliseconds to run the whole table (one million iterations) when using a separate index track. Native solutions could run through only two orders of magnitude less iterations in the same timeframe. This is definitely a win for large table iterations (until you get to a few hundred elements, it’s probably not worth it though).

I can hear you whining now: “But adding that extra tracking variable is a pain. I have a lot of tables to track!“. Sure. Since Lua tables can hold hash-type elements in addition to tracking elements, we can track the index ourselves:

This is about half as fast as tracking the index in a separate variable, running through one million elements in 20 milliseconds, but the convenience may be worth it. The extra-lazy can even obviate the need to predefine t.__cnt:

Tiny bit more overhead here, of course. The biggest win is the simplicity, with the biggest drawback being that trying to use hash type keys in addition to our tracked array becomes a bit tricky, such in the context of pairs (because we need to remember to ignore our meta key).

Hats off to John Graham-Cunning, who has some notes on this using Lua 5.2 a few years ago.

3 thoughts on “Fast(er)(est) Table Inserts in LuaJIT

  1. The operation to check the length of the table is O(log n) on the average case. Still, changing the whole operation from O(n log n) to O(n) brings evident gains. 🙂

    1. Hey Hisham!

      Thanks for the insight! Can you help me understand how O(log n) performance for table length checks results in the linear growth pattern we see hear? Did I improperly design something in my micro benchmark? btw have fun at luaconf, bummed I couldn’t make it 🙂

  2. Out of curiosity, have you thought of running the benchmarks with JIT turned off or even maybe with PUC Lua? I wonder if the “function call overhead” argument stands a better chance. It is, after all, not always possible to write a JIT-able loop; and in such cases it might be worth knowing if there is indeed an implementation that outperforms the other!

    I might give it a try some day…


Leave a Reply

Your email address will not be published. Required fields are marked *