Hashtables
StahlOS contains an implementation of hashtables based on Google's Swiss table hashtables, using as reference "Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step". It also implements the "cache-line-aware groups" optimization described at 47:25 of "Designing a Fast, Efficient, Cache-friendly Hash Table, Step by Step," since it only supports 8-byte entries.
It's implemented in AArch64 assembly and follows the typical Procedure Call Standard, so it can be tested (and used) in userspace. The implementation (in kernel/src/common/hashtable/table.s
) is hopefully quite readable, so it should be treated as part of this documentation as well, and is the primary reference for most of the algorithmic details of the hashtables.
Collisions
The hashtables index hash groups (kinda sorta like buckets) with the low bits of the hash, and use the high 8 bits to find elements within the bucket. The number of hash groups is always a power of two, simplifying the implementation. Resizing occurs after a load factor of 15/16 (93.75%) is reached.
>>> def birthday_problem(hash_domain, num_entries):
... """Returns the odds that no two entries in a group of `num_entries`
... share the same random `hash_domain`-bit hash."""
... acc = 1
... for i in range(num_entries):
... acc *= ((hash_domain - i) / hash_domain)
... return acc
>>> def collision_chance(num_entries):
... log2_num_groups = ceil(log2(num_entries / 7 * 16 / 15))
... hash_domain = 2 ** 8 * 2 ** log2_num_groups
... return birthday_problem(hash_domain, num_entries)
>>> plt.title("probability of no collisions")
... plt.gca(xlabel="number of entries", ylabel="probability")
... plt.plot(range(1, 500), [(collision_chance(n), 0.5) for n in range(1, 500)])
... plt.show()
Since we do open addressing, the chance of a collision might actually be higher. TODO: Analyze this.
Bitwise Tricks
Estimate log2_initial_group_count
from Initial Capacity
hashtable_init
takes log2_initial_group_count
(which is efficient to compute on) rather than an initial capacity, which is what most programmers expect. We provide an efficient approximation (which avoids the cost of a log
and a division):
>>> def clz(x):
... size = x.size()
... expr = BitVecVal(size, size)
... for i in reversed(range(x.size())):
... j = size - i - 1
... bit = Extract(j, j, x)
... expr = If(bit == 1, i, expr)
... return expr
>>> def estimate_log2_initial_group_count(initial_cap):
... return 64 - clz((initial_cap * 3) >> 4)
>>> def estimate_log2_initial_group_count_correct(initial_cap):
... log2_initial_group_count = estimate_log2_initial_group_count(initial_cap)
... initial_group_count = BitVecVal(1, 64) << log2_initial_group_count
... capacity = initial_group_count * 7 * 15 / 16
... return And(ULE(initial_cap, capacity),
... If(initial_cap < 7, ULT(capacity, initial_cap + 8),
... ULT(capacity, initial_cap * 4)))
>>> n = BitVec("n", 64)
>>> solve(Not(estimate_log2_initial_group_count_correct(n)), ULE(1, n), ULT(n, 2 ** 56))
no solution
As Z3 shows (by failing to find a counter-example), for any initial capacity between 1 and 256, the approximation in estimate_log2_initial_group_count
is correct, and not too wasteful.
Find Free Slot
During insertion, we want to be able to find a free slot in the hash group, given the metadata byte.
>>> cs = []
>>> def clz(x):
... size = x.size()
... expr = size
... for i in reversed(range(x.size())):
... j = size - i - 1
... bit = Extract(j, j, x)
... expr = If(bit == 1, i, expr)
... return expr
>>> def index(l, n, canary):
... cs.append(n >= 0)
... cs.append(n < len(l))
... expr = canary
... for i in range(len(l)):
... expr = If(i == n, l[i], expr)
... return expr
>>> def find_free_slot(n):
... return 63 - clz(~(n | 0xffffffffffffff01))
>>> def find_free_slot_correct(n):
... p = [Bool(f"p{i}") for i in range(8)]
... cs.append(Not(p[0]))
... for i in range(1, 8):
... cs.append(p[i] == (Extract(i, i, n) == 1))
... cs.append(Extract(63, 8, n) == 0)
... i = Int("i")
... cs.append(i == find_free_slot(n))
... return If(i == 0,
... reduce(And, (Not(p[i]) for i in range(1, 8))),
... Not(index(p, i, False)))
>>> n = BitVec("n", 64)
>>> solve(Not(find_free_slot_correct(n)), *cs)
no solution
Since Z3 couldn't find examples of incorrectness, find_free_slot
must be a correct implementation.