The Tunes LLL Subproject


Contents of the LLL Subproject

Here are the current contents of this LLL project:
  • Firstly, the goals of the subproject
  • Second, the requirements for this LLL.
  • Then, a first sketch of the LLL semantics
  • A list of difficulties about the LLL
  • Some implementation ideas for the LLL
  • A list of modules to implement.
  • There are two sub-subprojects below the LLL: the i386 or the OTOP subproject.

    As you see, the LLL subproject is still in a draft process :(
    Feel free to contribute !

    You may also want to
    look at the what Review subproject said about implementations of other systems,
    go back to the main Tunes page, or
    jump back to the Tunes Subproject page.


    Goals for the Low Level Language

    The goal of this Subproject is a define and implement a low-level language that will fulfill all the requirement to be used as a basis for the Tunes system, including self-extending into the full Tunes HLL, etc.


    LLL Requirements

    • see the HLL Requirements...
    • portability accross different architecture
    • local (real-time ?) and distributed GC for higher-order persistent objects
    • annotations (typing is done through them)


    LLL Semantics

    The LLL system is centered around the object/annotation paradigm

    Low-level objects

    • objects can be uniquely identified in some way.
    • Typically, objects in the local address space are identified by their 32 bit CPU address (perhaps with an implementation-dependent offset)
    • objects outside it are accessed using descriptors/handles inside it, so need not be considered as special objects wrt the LL memory system.
    • objects of a same address space must trust each other anyway, so let them have a cooperative GC and multithreading.
    • communication between address spaces happens directly through space-dependent drivers.

    Annotations

    • There are standard mechanisms for annotations and their resolution
    • Every annotation may have its own implementation, be it a hashtable of object to value association, or an array of values, or some executable code.
    • lazy evaluation and futures, typing, UI interface, scoping are done through annotations.
    • for example, an object's concrete type (relative to the GC mechanism) may be determined from bits in its address, whether statically or dynamically.


    LLL Difficulties

  • See HLL difficulties...
  • Having some efficient hardware-independent stuff.

  • Garbage Collecting
    • Infix pointers (that do not point at some globally constant offset to the beginning of an allocation unit) greatly complicate the GC. They should be forbidden whenever possible.
    • "C" like infix pointers can still be simulated with a segment-offset pair, with an aligned pointer and an offset inside the segment.
    • The GC may have to accept infix pointers as for code return addresses, or else the calling convention may become grossly unefficient
      I propose that code "segments" should not cross, say 4K or 8K boundaries, so that finding the right code segment is just a matter of checking the list of segments in the segment obtained by masking the right bits.
    • How to differentiate pointers from numbers, etc ?
      structural differentiation is powerful, but may slow the GC considerably, unless descriptors are simple (can be just an integer for length of a pointer array, for most objects), and forbids dynamic differentiation, mixing integers and pointers in an array (e.g. simple stack), etc. That's why we'll use a simple bit pattern to differentiate integers (raw data) from pointers (structured data), and different kind of pointers from each other (that's a BIg Bunch Of Pages kind of GC).
    • we must choose between integers having low bit set or low bit cleared. Having it set (and thus having bit cleared for pointers) may allow faster pointer access on RISC machines, but slows any arithmetics. Having bit set for pointers allow easier arithmetics, but forces the use of an offset for all memory accesses.
    • The big question is: will integers be stripped of their low bit, which would simplify overflow testing code to naught, and make the implementation portable, but make a little harder doing pointer arithmetics and mixing of true integers with 31 bit ones. Or stripping them from their overflow bit, which makes integer overflows to generate GC-readjustable pointers, rather than providing flat modulo arithmetics, but allows easy pointer arithmetics and mixing of 31-bit integers and 32-bit ones ?
    • We shall implement both ways, and compare actual execution time and code space measurements !!!
    • A high-level page directory is used to determine the GC type of objects according to the page it is in. It is a multi-level hashed structure that may evolve with the GC code, so that it may allow to find quickly the type of objects. Typically a mix between arrays and balanced binary trees to recognize bit patterns.
    • The GC type of an object, as determined by its address gives us routines to update the object during a GC, to destroy the object when it is not accessed anymore, etc.
    • The GC type of a page chunk allows us to track down the beginning of individual objects pointed to on the page (in case infix pointers are used), also gives us the policy to follow when swapping out the page (which may be copying the page to disk, sending it to the network, or compressing it to memory for possible further actual swapping out of memory, etc).
  • Persistence
    • Be careful with distributed persistence: always remember previous states until all transactions using it are finished and confirmed.


    Implementation Ideas

  • modules have an install field explaining how to restore/resume the object from the state log as gotten from persistent store. In general, this will be a call to a standard trusted high-level module. However, this can be low-level code in the very first bootstrapping modules...


    Modules to implement

  • generic memory management modules
    • A generic generational garbage collector with full support for persistence (checkpointing and restarting of objects).
    • A generic module for inter-heap (perhaps distributed) synchronized persistent garbage collector
    • A generic module for back-reference unsynchronized persistent conservative garbage collector.
    • generic support for chunks of executable binary
    • generic full support for GOOSE modules.
    • mechanisms for the migration manager (from the Migration subproject) to use: swapping to disk, to a compressed RAM zone, to another host, etc.

  • low-level management of user interface hardware
    • A console adapter for text-mode
    • a text-mode adapter for serial consoles (perhaps just use the ncurses package)
    • A screen windowing multiplexer for text-mode screens
    • A generic output-synchronizing input multiplexer
    • A standard combination of the previous two.
    • A fast graphic library to attract game programmers

  • File formats (I guess these should presumably go to the HLL stdlib...)
    • A partition manager, that multiplexes hard disks according to standard partitioning methods.
    • Support for various existing file systems: MS-DOS FAT FS (and WindowsNT or Linux UMSDOS extensions), Linux EXT2 FS, etc.
    • Graphic file formats: GIF, JPEG, PCX, TIFF, etc
    • Various Audio file formats for sound samples


    To Do on this page

  • Find all LLL requirements.
  • Find all implementational problems.
  • Make first proposals.
  • Divide into subfiles.
  • Define separately semantics & representation
  • Find out how multiple architectures can be made interoperable...
  • Find some nice implementation for annotations...
  • Separate annotation mechanisms from annotation resolving policies...
  • Open a "GC" subproject ?


    Back to the Tunes Home page or Subprojects page


    Page Maintainer:
    Faré -- rideau@clipper.ens.fr