As you see, the LLL subproject is still in a draft process :(
Feel free to contribute !
You may also want to
look at the what Review
subproject said about implementations of other systems,
go back to the main Tunes page, or
jump back to the TunesSubproject page.
The goal of this Subproject is a define and implement
a low-level language that will fulfill all the requirement
to be used as a basis for the Tunes system,
including self-extending into the full Tunes HLL, etc.
Infix pointers (that do not point at some globally constant offset
to the beginning of an allocation unit) greatly complicate the GC. They
should be forbidden whenever possible.
"C" like infix pointers can still be simulated with a segment-offset
pair, with an aligned pointer and an offset inside the segment.
The GC may have to accept infix pointers as for code return addresses,
or else the calling convention may become grossly unefficient
I propose that code "segments" should not cross, say 4K or 8K boundaries, so
that finding the right code segment is just a matter of checking the list of
segments in the segment obtained by masking the right bits.
How to differentiate pointers from numbers, etc ?
structural differentiation is powerful, but may slow the GC considerably,
unless descriptors are simple (can be just an integer for length of a pointer
array, for most objects), and forbids dynamic differentiation, mixing integers
and pointers in an array (e.g. simple stack), etc.
That's why we'll use a simple bit pattern to differentiate integers (raw data)
from pointers (structured data), and different kind of pointers from each other
(that's a BIg Bunch Of Pages kind of GC).
we must choose between integers having low bit set or low bit cleared.
Having it set (and thus having bit cleared for pointers) may allow faster
pointer access on RISC machines, but slows any arithmetics. Having bit set
for pointers allow easier arithmetics, but forces the use of an offset for
all memory accesses.
The big question is: will integers be stripped of their low bit, which
would simplify overflow testing code to naught, and make the implementation
portable, but make a little harder doing pointer arithmetics and mixing of
true integers with 31 bit ones. Or stripping them from their overflow bit,
which makes integer overflows to generate GC-readjustable pointers, rather
than providing flat modulo arithmetics, but allows easy pointer
arithmetics and mixing of 31-bit integers and 32-bit ones ?
We shall implement both ways, and compare actual execution time and
code space measurements !!!
A high-level page directory is used to determine the GC type of objects
according to the page it is in. It is a multi-level hashed structure that
may evolve with the GC code, so that it may allow to find quickly the type of
objects. Typically a mix between arrays and balanced binary trees to recognize
bit patterns.
The GC type of an object, as determined by its address gives us routines
to update the object during a GC, to destroy the object when it is not accessed
anymore, etc.
The GC type of a page chunk allows us to track down the beginning of
individual objects pointed to on the page (in case infix pointers are used),
also gives us the policy to follow when swapping out the page (which may be
copying the page to disk, sending it to the network, or compressing it to
memory for possible further actual swapping out of memory, etc).
Persistence
Be careful with distributed persistence: always remember previous
states until all transactions using it are finished and confirmed.
Use the m4 preprocessor to produce source files for all assemblers
(including C and i386 assembly) from the same meta-source.
modules have an install field explaining how to restore/resume the object
from the state log as gotten from persistent store.
In general, this will be a call to a standard trusted high-level
module. However, this can be low-level code in the very first
bootstrapping modules...