Garbage Collector

Currently, the garbage collector is an implementation of Cheney's semispace collector. Right now, it's non-generational, but this is very likely to change in the future.

Memory Layout

Allocations' sizes are a multiple of 8 bytes, and must be at least 8 bytes.

Each allocation has an extra 8-byte dword of metadata before the allocation itself. The high 32 bits act as flags:

     addr-4           addr-3          addr-2          addr-1
LSB          MSB LSB         MSB LSB         MSB LSB          MSB
+---------------+---------------+-+-+-----------+---------------+
|   Reserved    |    Reserved   |D|M|  Reserved |    Reserved   |
+---------------+---------------+-+-+-----------+---------------+
  • D: All Data. This flag causes the low 32 bits to be interpreted differently. See below for details.
  • M: Moved. This flag declares that the value has been moved to the to-space, and the first dword of data is the forwarding pointer.

All reserved bits must be 0.

The low 32 bits are interpreted as either one or two length fields, depending on the value of the D flag.

If the D flag is zero, the low 16 bits (starting at addr-8) are interpreted as the length of the allocation, measured in 8-byte dwords. The next 16 bits (starting at addr-6) are interpreted as the number of data dwords in the allocation.

If the D flag is one, the low 32 bits (starting at addr-8) are interpreted as the length of the allocation in dwords, and it is assumed that all of these are data.

Allocation

Allocation is very simple. A "bump pointer" allocator is used, which keeps track of the next free address. This value is added to with each allocation.

If this would go outside the bounds of the heap, instead a collection will be triggered. For this reason, allocation must always occur at a GC safepoint.

Collection

The roots of the garbage collector are found in the x16, x17, x18, and x19 registers, as well as the Forth data stack. From each of these values, a recursive function runs, taking the value and returning a new value for it to have.

  • If the high bit is zero, the same value is returned.
  • If the high bit is one, the value is treated as a pointer to the start of an allocation, whose M flag is checked.
    • If the M flag is one, the first dword is read, and its value is returned.
    • If the M flag is zero, a new allocation is made in the to-space with the same lengths and flags, and the data is then copied over. The M flag is then set on the original allocation, and the address of the new allocation is written to the first dword of the original allocation.

Once this has finished, this same process runs directly over values in the to-space. Any data values as specified by the header are skipped.

After this, the from-space is freed.