The Tunes Migration Subproject


Contents of the Migration Subproject

Here are the current contents of the Distribution project:


Goals for the Migration subproject

The goal of this Subproject is a find, study, and implement existing or new techniques that allow to take full advantage of distribution in a computing system, through the concept of dynamic migration of fine-grained objects.

Migration is when an object gets abstracted from its current context and applied to another context. The system shall automatically do such operations to automatically dispatch objects accross the distributed system, so as to improve performance, by reducing overall resources wasted (including being idle, administrating, migrating, and running unoptimized code), while respecting the human user's constraints (response time, total computation time).
Migration encompasses dynamically balancing load over a pool of resource providers, providing orthogonal persistence, upgrading modules, managing moving users logging in on moving places in a computer network, detaching and reattaching objects to different I/O devices/terminals/whatever, etc.
All the migration scheduling will be under full user control, be it direct, indirect, remote, purely potential, or freezed to some compile-time behavior.

Migration is a very important feature in the Tunes system, that no other known system do provide at all (but in its most primitive, non parametrizable form: swapping raw chunks of memory between memory and harddisk).



A conceptual and implementational Comparison between Migration and more traditional concepts.

The limits of sequential processing, and the development of networks of computers encourage the coming out of parallel and distributed computing.
In the same time, with the permanent evolution of hardware architecture and the consecutive hardware heterogeneity becoming reckoned data, techniques for building portable applications are gaining popularity (sometimes to the point of creating crazes).
Among all the techniques whose exploration has just begun, Migration is a concept that appears as a natural solution to adaptative distribution, and introduces a new point of view on portability: fine-grained computer objects are dynamically "migrated" from computer to computer, so as to optimize computer resource usage.

The simplest traditional technique to break the limits of sequential computing is traditional symmetric multiprocessing: multiple high-end processors as a pool in a same (very expensive) computer are dispatched coarse-grained process to execute, the computing model for each process being the traditional sequential model. Actually, this expensive and technologically limited hardware technique is just an expedient hack to long the life of the traditional centralized "client/server" model that allows the server to manage more requests at once. But it doesn't allow to take advantage of the growing multi-computer networks it relies upon. It can be considered as the stubborn brute-force approach to the problem of parallel computing.
Then comes traditional distributed computing, which is a software implementation of the previous approach, where coarse-grained processes are dispatched over a pool of logically equivalent processors in a computer network. Much better than the above method (that it encompasses, but for the ease of dynamical load-balancing), traditional distributed computing does allow to take advantage of existing and growing computer networks. With some refinement, and the choice of a compromise emulated virtual processor, it can even run on a heterogeneous network of computers.
Traditional parallel computing rather relies on hardware clusters of similar processors to run a same program; thus, contrarily to traditional symmetric multiprocessing or distributed computing, the parallelism is now fine-grained. The problem with it is the high cost of parallel hardware, and its complete lack of adaptativity, as different problems each formalize into different network geometry, that a same parallel hardware cannot all provide at reasonable cost. Thus, unless a particular class of similar problems is in mind, specialized parallel hardware costs too much to be worth it, notwithstanding the evolution of computer hardware that will make such an expensive machine quickly obsolete.
Of course, this traditional parallel computing can in turn be software emulated, as above, from a homogeneous or heterogeneous network of computers, but then, dispatching techniques that were successfully used for coarse-grained computing prove to have too large an overhead for fine-grained computing, all the more when the computer network is heterogeneous as of computer architecture, speed, transmission latency, reliability, security.
If we restate this last problem as that of fine-grained distributed computing in a dynamically evolving network, we obtain migration as a possible solution. If we insist just a bit on the (not yet traditional) aspects of persistence and dynamism in such distributed computing, migration becomes a natural solution: computer objects are migrated in the network, so as to find a place where resources are enough, while respecting constraints of availability to the I/O nodes that request the object and unavailability to unsecure I/O nodes.

How are portability techniques involved by such approach of distributed computing? Firstly, because fine-grained objects are constantly moved across the network, there should be some unified way to transmit them. Unified means that any computer can ultimately talk to any other computer with which it may want to exchange objects, but does not mean that there will be a one global low-level encoding that would be used by all computers. Actually, wherever in the network there is a subnetwork (which can reduce to a single computer) where objects tend to stay as much or more than they tend to go out, it is useful to translate these objects into some encoding more adapted to the local subnetwork.
Well known techniques for a global encoding such as bytecode interpreters or virtual machines, as have been widely used since early LISP systems, are good way to exchange code as well as data. Dynamical compilation (which has recently proven an efficient technique as people developped emulators for old hardware standards) can give performance to such systems; and nothing prevents architecture-dependent binaries to be provided in a fine-grained way, which is already widely used in migrationless coarse-grained contexts.
However, such low-level techniques tend to tie computations to some particular computing models, and make migration difficult or inefficient over significantly different architectures, where standard word sizes (8/16/20/32/64/128/howevermany bit, notwithstanding software encoding conventions), memory geometry (flat, segmented, banked), CPU design (register-based, stack-based, or whatever), breadth of CPU memory (number of registers, depth of stacks), or instruction set richness (CISC, RISC, VLIW, ZISC, or MISC) differ.
And even if some architecture family may be considered as "fairly standard" currently (e.g. RISCish 32-bit architecture in the 1990's), we know that any technology is due to be made obsolete by newer ones, so that unless we want to be tied to some architecture, or to lose the possibility of slowly upgrading hardware while keeping portability, we must not base portability on low-level standards. That is, Portability should not only be across space, but also across time.
This is why, when portability is more really meant, not just efficient portability in some particular computing model, high-level semantic-based object encodings, not low-level execution-based object encodings, should be used.
This means that portable objects must be much nearer to parsed source code than to compiled binary code. They can be interpreted at run-time, or dynamically compiled, and all combinations of the previous techniques still apply, as long as they do not hamper the enhanced kind of portability.
Actually, all these techniques were particular cases of meta-programmed dynamic evaluation, which consists in evaluating an object consistently with its high-level semantics, but by taking into account hints and tactics provided with the object, that help optimize it in some common possible contexts, providing enough information to skip the analysis and trial processes that make traditional coarse-grained optimization unadapted to fine-grained distributed computing.

Now, what additional complexity does Migration involve, as compared to previously considered techniques?
Migration has an implementational problem among distribution techniques which is quite comparable to the problem garbage collection (more simply known as GC) had among memory allocators. Actually, it appears that GC is a particular case of Migration, where only the memory resource is considered.
Because some objects may have to be further migrated, the system must be aware of any particular encoding used for these objects, so that it can recode them into code suited for the new location of those objects after migration.
And the solutions to this additional problem are also quite similar to those used for garbage collection: the system keeps track of a migration-wise "type" for each object, so that it can run through the structure of objects and do whatever pruning and grafting it needs to do.
After migration is made possible comes the problem of scheduling such fine-grained objects: techniques that apply to coarse-grained processes, with coarse-grained static tables for process statistics, do not apply to fine-grained objects. They would induce too large a overhead over them. Actually, they already cause considerable direct or indirect overhead with the underlying traditional coarse-grained centralized computing model.
Actually, the traditional low-level migrationless coarse-grained centralized computing model is a degenerate case for high-level migrating fine-grained distributed computing. So we may reasonably expect that the latter computing model will bring at least as good performance, because it can have exactly the same performance by restraining to this degenerate case in last resort.
To not spend more resource at scheduling than we gain at distributing, relatively to the previous model, we can simply do resource-based scheduling: only availability of resource (e.g. some provider going idle), or intended use of resources large enough will trigger distributed scheduling, so that no overhead will be felt if no resource are available, while if resource are available, the overhead will be more than largely compensated by the time gained by activating or managing resources that would have been seriously under-used on a non-migrating environment.
Of course, tranmission time is not free and must be considered a resource itself; consequently, objects should not be migrated without the consecutive effects upon network traffic being considered by the scheduler. To ease this work, when created and published, and/or after some profiling period, objects must come with information about how the object behaves relatively to interobject communication, as well as the usage the objects make of traditional resources, which is already done by traditional schedulers.
Because of the induced overhead, objects will come to the scheduler in medium-grained packaged modules that will include all this information. This won't prevent objects to be distributed in fine grain, though, for there is no reason why scheduling should be done with the same grain as execution.
Now, if objects are to migrate in a way that is not statically controlled, comes the problem of how safety, consistency and security constraints will be enforced.
Again, a low-level standard for Migration would make objects completely uncontrollable: no proof system can be both reliable and usable for low-level objects to prove high-level properties, while the constraints we need fulfill are quite high-level. The argument stands in any computing system, which already makes low-level programming standards a bad choice for any kind of computing; but the danger grows as objects move in a distributed system, having more chances to find a security leak, and being able to corrupt for more persons, should the above constraints fail to be enforced.
The obvious solution is again to choose high-level objects as a standard for migrating objects. Proof techniques already exist that fulfill any imaginable security criteria. As safety constraints can often be made quite simple, automated proof techniques could be used to enforce it in the common case. Consistency and security can involve much more complicated properties, though, and there is no reasonable way for the object receiver to build a proof for a common object; so the natural way to do things is for the receiver to require the sender to provide such proof. Finally, once safety is guaranteed, and because any proof relying on the behavior of physical objects has hypotheses that can never completely accepted, it may be quite acceptable that security techniques include cryptographic procedures.
Of course, any user of the system could twiddle his own strengthened or weakened security criteria, that would alter the distributed or secure characteristic of his subsystem; but at least, it is feasible to have a system both distributed and secure, where no one can be victim of failures of which one is not responsible.

To conclude, Migration comes as a natural extension to traditional computing techniques to solve the currently growing problem of the limits of the sequential computing model in way adapted to a world of networked computers; its implementation can be achieved by a small adaptation of well known techniques, without prohibitive overhead. It is a computing frame still to explore, that opposes the traditional static low-level coarse-grained centralized computing model.



Modules

  • see the HLL page...
  • see the LLL page...
  • portability accross different architecture
  • local (real-time?) and distributed higher-order GC
  • insist on fine grain allowing better, more efficient distribution
  • annotations (typing is done through them)


    Distribution Difficulties


    To Do on this page


    Back to the Tunes Home page.


    Page Maintainer:
    Faré -- rideau@clipper.ens.fr