Part II: Language Utility: Not reinventing the wheel

  1. Computer Languages
  2. Firstly, let's settle what we call a "computer language".
    A computer language is just any means by which humans (or other computers) can communicate with a computer; any media for exchanging information is a language.

    Now, this makes a language even out of point-and-click-on-window-system-screen. So what ? Why has a language got to use ASCII symbols at all ? People use sounds, the deaf use their hands, various animals use a lot of different ways to communicate. What makes the language is the structure of the information communicated, not the media for this communication. Written or spoken english, with differences, are both english, and recognizable as such. Of course, symbol-based languages are simpler to implement on today's computers, and but that's only a historical dependency, that may eventually disappear.

    So what kind of structure shall a computer language have ? What makes a language structure better or more powerful than another ? That's what we'll have to inspect.


  3. Goal of a computer language
  4. It should stressed that computer languages have nothing to do with finished, static "perfect" computer programs - those can have been written in any language, preferably a portable one (that is any ANSI supported language, i.e. surely "C", even if I'd then personally prefer FORTH). If all interesting things already had been said and understood, there would be no more need for a language; but there are infinitely many interesting things, so a language will always be needed, and no finite dictionary will ever be enough.
    Computer languages have to do with programming, with modifyings programs, creating new programs, not just watching existing ones. That's just like an operating system, which is not useful as a static library, but as a frame for dynamic computing.

    Thus, the qualities of a (programming) language do not lie only in what can eventually be done as a static program with the language (the language being hopefully Turing-equivalent with libraries to access all the hardware (i.e. able to express anything). It does not lie in the efficiency of a straightforward implementation (i.e. easy access to beginners) either, as a good "optimizing" compiler can always be achieved later, and speed critical routines can be included in libraries (i.e. if you really need a language, then you won't be a beginner for a long time at this language).
    The qualities of a language lie in the easiness to express new concepts, and to modify existing routines.

    With this in mind, a programming language is better than another if it is easier for a human to write a new program or to modify an existing program, or of course to reuse existing code (which is some extension to modifying code); a language is better, if sentences of equal meaning are shorter, or if just if better accuracy is reachable.


  5. Reuse versus Rewrite
  6. We evaluated a computing system's utility by the actual time saved by using them on the long run, as compared to using other tools instead, or not using any. Now, personal expediency suggests that people keep using the same tools as they always did, however lame they may be, and add functionalities as they are needed, because learning and installing new tools is costly. But this leads to obsolete tools grown with bogus bulging features, that provide tremendous debugging and maintenance costs. It results in completely insecure software, so no one trusts any one else's software, and no one wants to reuse other people's software, all the more if one has to pay.

    For the problem is that, with existing tools, 99.99% of programming time throughout the world is spent doing again and again the same basic things that have been done hundreds of times before or elsewhere. It is common to hear (or read) that most programmers spend their time reinventing the wheel, or desesperately trying to adapt existing wheels to their gear. Of course, you can't escape asking students and newbies to repeat and learn what their elders did, so they can understand it and interiorize the constraints of computing. The problem is that today's lame tools and closed development strategies make learning difficult and reuse even more difficult, secure reuse being just impossible. Then people spend most of their time writing again and again new versions of earlier works, nothing really "worth" the time they spend, only so they can be sure they know what it does, even though they seldom manage to have it do what they want.

    Now, after all, you may argue that such a situation creates jobs, so is desirable; so why bother ?
    First of all, there is plenty of useful work to do on Earth, so time and money saved by not repeating things while programming can be spent on many many other activities (if you really can't find any, call me, I'll show you).
    Now, rewriting is a serious problem for everyone. First of all, rewriting is a loss of time, that make programming delays quite longer, thus are very costly. More costly even is the fact that rewriting is an error prone operation and anytime while rewriting, one may introduce errors very difficult to trace and remove (if need be, one may recall the consequences of computer failures in space ships, phone nets, planes). Reuse of existing data accross software rewrites, and communication of data between different software proves being of exorbitant cost. The most costly aspect of rewriting may also be the fact that any work has a short lifespan, and will have to be rewritten entirely from scratch whenever a new problem arises; thus programming investment cost is high, and software maintenance is of high cost and low quality. And it is to be considered that rewriting is an ungrateful work that disheartens programmers, which has an immeasurably negative effect on programmer productivity and work quality. Last but not least, having to rewrite from scratch creates an arbitrary limit to software quality, that is no software is better than what one man can program during one life.

    Therefore, it will now be assumed as proven that code rewriting is a really bad thing, and that we thus want the opposite: software reuse, software sharing. We could have arrived at the same conclusion just with this simple argument: if some software is really useful (considering the general interest), then it must be used many, many times, by many different people, unless it is some kind of computation with a definitive answer than concerns everybody (which is difficult to conceive: some software that would solve a metaphysical or historical problem !). Thus, useful software, least it be some kind of very unique code, is to be reused countless times. That's why to be useful, code must be very easy to reuse.
    It will be showed that such reuse is what the "Object-Orientation" slogan is all about, and what it really means. But reuse itself introduces new problems that have to be solved before reuse can actually be possible, problems as we already saw, of trust: how can one trust software from someone else ? How can one reuse software without spreading errors from reused software, without introducing errors due to misunderstanding or misadaptation of old code, and without having software obsolescence ? We'll see what are possible reuse techniques, and how they cope with these problems.


  7. Copying Code
  8. The first and the simplest way to reuse code is just the "copy-paste" method: the human user just copies some piece of code, and pastes it in a new context, then modifies it to fit a new particular purpose.
    This is really like copying whole chapters of a book, and changing a names to have it fit a new context; this method has got many flaws and lacks, and we can both moral and economically object to it.

    First of all, copying is a tedious and thus error-prone method: if you have to copy and modify the same piece of code thousands of time, it can prove a long and difficult work, and nothing will prevent you from doing as many mistakes while copying or modifying.
    As for the moral or economical objection, it is sometimes considered bad manners to copy other people's code, especially when copyright issues are involved; sometimes code is protected in such a way that one cannot copy it easily (or would be sued for doing that); thus this copy-paste method won't even be legally of humanly possible everytime.

    Then, assuming if the previous problems could be solved (which is not obvious at all), there would still be a big problem about code copying: uncontrolled propagation of bugs and lacks of feature accross the system. And this is quite a serious threat to anything like code maintenance; actually, copying code means that any misfeature in the code is copied altogether with intended code. So the paradox of code copying is that bad copying introduces new errors, while good copying spreads existing errors; in any case code copying is an error prone method. Error correction itself is made very difficult, because every copy of the code must be corrected according to its own particular context, while tracking down all existing copies is especially difficult as code will have been modified (else the copy would have been made useless by any macro-defining preprocessor or procedure call in any language). Moreover, if another programmer (or the same programmer some time later) ever wants to modify the code, he may be unable to find all the modified copies.

    To conclude, software creation and maintenance is made very difficult, and even impossible, when using copy-paste; thus, this method is definitely bad for anything but exceptional reuse of a small number of mostly identical code in a context where expediency is much more important than long-term utility. That is, copy-paste is good for "hacking" small programs for immediate use; but it's definitely not a method to program code meant to last or to be widely used.


  9. Having an extended the Vocabulary...
  10. The second easiest, and most common way to reuse code, is to rely on standard libraries, which are more like dictionaries and technical references than libraries (but the name stuck). You wait for the function you need to be included in the standard library, and then use it as the manual describes it when it is finally provided.

    Unhappily, standards are long to come, and are even longer to be implemented the way they are documented. By that time, you will have needed new not-yet-standard features, and will have had to implement them or to use non-standard dictionaries; when the standard eventually includes your feature, you'll finally have to choose between keeping a non-standard program, that won't be able to communicate with newer packages, or rewriting your program to conform to the standard.
    Moreover, this reuse method relies heavily on a central agency for editing revised versions of the standard library. And how could a centralized agency do all the work for everyone to be happy ? Trying to impose reliance on a sole central agency that is communism. Even relying only on multiple concurrent agencies is feudalism. Oneself is the only thing one can ultimately rely upon, while liberalism tells us only by having the freeest information interchange between everyone can we achieve a good system.

    It's like vocabulary, culture: you always need people to write dictionaries, encyclopaedias, and reference textbooks; but these people just won't ever provide new knowledge and techniques, they rather settle what everyone already know, thus facilitating communication where people had to translate between several existing ones more easily. You still need other people to create new things: you just can't wait for what you need to be included in the next revision of such reference book; it won't ever be if no one does settle it clearly before it may be considered by a standardization commitee.

    Now, these standard dictionaries have a technical problem: the more useful they strive to be, the larger they grow, but the larger they grow, the more difficult it gets to retrieve the right word from its meaning, which is what you want when you're writing. That's why we need some means to retrieve words from their subject, their relationship with other words; thus we need a language to talk about properties of words (perhaps the same), about how words are created, what words are or not in the standard dictionary and will or will not be. And this language will have to evolve too, so a "meta"-library will not be enough.

    Furthermore, how is a dictionary to be used ? A dictionary does not deal with new words; only old ones. To express non-trivial things, one must do more than just pronounce a magic word; one must combine words. And this is a matter of grammar - the structure of the language - not vocabulary. We could have seen immediately that immediately; standard libraries do not deal with writing new software, but with sharing old software (which is also useful, but different). So a library is great for reuse, but actually, a good grammar is essential to use itself, and reuse in particular.


  11. or a Better Grammar
  12. Now what does reuse mean for the language grammar ? It means that you can define new words from existing ones, thus creating new contexts, in which you can talk more easily about your particular problems. That is, you must be able to add new words and provide new, extended, dictionaries. To allow the most powerful communication, the language should provide all meaningful means to create new words. To allow multiple people whose vocabularies evolve independently to communicate their ideas, it should allow easy abstraction and manipulation of the context, so that people with different context backgrounds can understand exactly each other's ideas.

    That is, the right thing is not statically having a extended vocabulary, but dynamically having a extended vocabulary; now however statically extended, the vocabulary will never be large enough. Thus we need good dynamical way to define new vocabulary. Again, it's a matter of dynamism versus statism. Current OSes suck because of their statism.


    First, we see that the same algorithm can apply to arbitrarily complex data structures; but a piece of code can only handle a finitely complex data structure; thus to write code with full genericity, we need use code as parameters, that is, second order. In a low-level language (like "C"), this is done using function pointers.

    We soon see problems that arise from this method, and solutions for them. The first one is that whenever we use some structure, we have to explicitly give functions together with it to explain the various generic algorithm how to handle it. Worse even, a function that doesn't need some access method about an the structure may be asked to call other algorithms which will turn to need know this access method; and which exact method it needs may not be known in advance (because what algorithm will eventually be called is not known, for instance, in an interactive program). That's why explicitly passing the methods as parameters is slow, ugly, inefficient; moreover, that's code propagation (you propagate the list of methods associated to the structure -- if the list changes, all the using code changes). Thus, you mustn't pass explicitly those methods as parameters. You must pass them implicitly; when using a structure, the actual data and the methods to use it are embedded together. Such a structure including the data and methods to use it is commonly called an object; the constant data part and the methods, constitute the prototype of the object; objects are commonly grouped into classes made of objects with common prototype and sharing common data. This is the fundamental technique of Object-Oriented programming; Well, some call it that Abstract Data Types (ADTs) and say it's only part of the "OO" paradygm, while others don't see anything more in "OO". But that's only a question of dictionary convention. In this paper, I'll call it only ADT, while "OO" will also include more things. But know that words are not settled and that other authors may give the same names to different ideas and vice versa.

    BTW, the same code-propagation argument explains why side-effects are an especially useful thing as opposed to strictly functional programs (see pure ML :); of course side effects complicate very much the semantics of programming, to a point that ill use of side-effects can make a program impossible to understand or debug -- that's what not to do, and such possibility is the price to pay to prevent code propagation. Sharing mutable data (data subject to side effects) between different embeddings (different users) for instance is something whose semantics still have to be clearly settled (see below about object sharing).


    The second problem with second order is that if we are to provide functions other functions as parameter, we should have tools to produce such functions. Methods can be created dynamically as well as "mere" data, which is all the more frequent as a program needs user interaction. Thus, we need a way to have functions not only as parameters, but also as result of other functions. This is Higher order, and a language which can achieve this has a reflective semantics. Lisp and ML are such languages; FORTH also, whereas standard FORTH memory management isn't conceived for a largely dynamic use of such feature in a persistent environment. From "C" and such low-level languages that don't allow a direct portable implementation of the higher-order paradygm through the common function pointers (because low-level code generation is not available as in FORTH), the only way to achieve higher-order is to build an interpreter of a higher-order language such as LISP or ML (usually much more restricted languages are actually interpreted, because programmers don't have time to elaborate their own user customization language, whereas users don't want to learn a new complicated language for each different application and there is currently no standard user-friendly small-scale higher-order language that everyone can adopt -- there are just plenty of them, either very imperfect or too heavy to include in every single application).

    With respect to typing, Higher-Order means the target universe of the language is reflective -- it can talk about itself.

    With respect to Objective terminology, Higher-Order consists in having classes as objects, in turn being groupable in meta-classes. And we then see that it _does_ prevent code duplication, even in cases where the code concerns just one user as the user may want to consider concurrently two -- or more -- different instanciations of a same class (i.e. two sub-users may need toe have distinct but mostly similar object classes). Higher-Order is somehow allowing to be more than one computing environment: each function has its own independant environment, which can in turn contain functions.


    To end with genericity, here is some material to feed your thoughts about the need of system-builtin genericity: let's consider multiplexing. For instance, Unix (or worse, DOS) User/shell-level programs are ADTs, but with only one exported operation, the "C" main() function per executable file. As such "OS" are huge-grained, with ultra-heavy inter-executable-file (even inter-same-executable-file-processes) communication semantics no one can afford one executable per actual operation exported. Thus you'll group operations into single executables whose main() function will multiplex those functionalities.

    Also, communication channels are heavy to open, use, and maintain, so you must explicitly pass all kind of different data & code into single channels by manually multiplexing them (the same for having heavy multiple files or a manually multiplexed huge file).

    But the system cannot provide builtin multiplexing code for each single program that will need it. It does provide code for multiplexing the hardware, memory, disks, serial, parallel and network lines, screen, sound. POSIX requirements grow with things a compliant system oughta multiplex; new multiplexing programs ever appear. So the system grows, while it will never be enough for user demands as long as all possible multiplexing won't have been programmed, and meanwhile applications will spend most of their time manually multiplexing and demultiplexing objects not yet supported by the system.

    Thus, any software development on common OSes is hugeware. Huge in hardware resource needed (=memory - RAM or HD, CPU power, time, etc), huge in resource spent, and what is the most important, huge in programming time.

    The problem is current OSes provide no genericity of services. Thus they can never do the job for you. That why we really NEED generic system multiplexing, and more generally genericity as part of the system. If one generic multiplexer object was built, with two generic specializations for serial channels or flat arrays and some options for real-time behaviour and recovery strategy on failure, that would be enough for all the current multiplexing work done everywhere.


    So this is for Full Genericity: Abstract Data Types and Higher Order. Now, if this allows code reuse without code replication -- what we wanted -- it also raises new communication problems: if you reuse objects especially objects designed far away in space or time (i.e. designed by other people or an other, former, self), you must ensure that the reuse is consistent, that an object can rely upon a used object's behaviour. This is most dramatic if the used object (e.g. part of a library) comes to change and a bug (that you could have been aware of -- a quirk -- and already have modified your program accordingly) is removed or added. How to ensure object combinations' consistency ?

    Current common "OO" languages are not doing much consistency checks. At most, they include some more or less powerful kind of type checking (the most powerful ones being those of well-typed functional languages like CAML or SML), but you should know that even powerful, such type checking is not yet secure. For example you may well expect a more precise behavior from a comparison function on an ordered class 'a than just being 'a->'a->{LT,EQ,GT} i.e. telling that when you compare two elements the result can be "lesser than", "equal", or "greater than": you may want the comparison function to be compatible with the fact of the class to be actually ordered, that is x<y & y<z => x<z and such. Of course, a typechecking scheme, which is more than useful in any case, is a deterministic decision system, and as such cannot completely check arbitrary logical properties as expressed above (see your nearest lectures in Logic or Computation Theory). That's why to add such enhanced security, you must add non-deterministic behaviour to your consistency checker or ask for human help. That's the price for 100% secure object combining (but not 100% secure programming, as human error is still possible in misexpressing the requirements for using an object, and the non-deterministic behovior can require human-forced admission of unproved consistency checks by the computer).

    This kind of consistency security by logical formal property of code is called a formal specification method. The future of secure programming lies in there (try enquire in the industry about the cost of testing or debugging software that can endanger the company or even human lives if ill written, and insurance funds spent to cover eventual failures - you'll understand). Life concerned industries already use such modular formal specification techniques.

    In any cases, we see that even when such methods are not used automatically by the computer system, the programmer has to use them manually, by including the specification in comments or understanding the code, so he does computer work.

    Now that you've settled the skeleton of your language's requirements, you can think about peripheral deduced problems.

    .....

  13. Higher Order Grammar
  14. The problem of having generic programs instead of just specific ones is exactly the main point that we saw about having a good grammar to introduce new generic objects, instead of just an increasing number of terminal, first order objects, that actually do specific things (i.e. extending the vocabulary).


    What is really useful is a higher-order grammar, that allows to manipulate any kind of abstraction that does any kind of things at any level. We call level 0 the lowest kind of computer abstraction (e.g. bits, bytes, system words, or to idealize, natural integers). Level one is abstractions of these objects (i.e. functions manipulating them). More generally, level n+1 is made of abstractions of level n objects. We see that every level is a useful abstraction as it allows to manipulate objects that would not be possible to manipulate otherwise.

    But why stop there ? Everytime we have a set of level, we can define a new level by having objects that arbitrarily manipulate any lower object (that's ordinals); so we have objects that manipulate arbitrary objects of finite level, etc. There is an unbounded infinity of abstraction levels. To have the full power of abstraction, we must allow the use of any such level; but why not allow manipulating such full-powered systems ? Any logical limit you put on the system may be reached one day, and this day, the system would become completely obsolete; that's why any system to last must potentially contain (not in a subsystem) any single feature that may be needed one day.

    The solution is not to offer any bounded level of abstraction, but unlimited abstracting mechanisms; instead of offering only terminal operators (BASIC), or first level operators (C), or even finite-order offer combinators of arbitrary order.

    offer a grammar with an embedding of itself as an object. Of course, a simple logical theorem says that there is no consistent internal way of saying that the manipulated object is indeed the system itself, and the system state will always be much more complicated than it allows the system to understand about itself; but the system implementation may be such that the manipulated object indeed is the system. This is having a deep model of the system inside itself; and this is quite useful and powerful. This is what I call a higher-order grammar -- a grammar defining a language able to talk about something it believes be itself. And this way only can full genericity be achieved: allowing absolutely anything that can be done about the system, from inside, or from outside (after abstracting the system itself).


    ..... A technique should be used when and only when it is best fit; any other use may be expedient, but not quite useful.

    Moreover, it is very hard to anticipate one's future needs; whatever you do, there will always be new cases you won't have.

    lastly, it doesn't replace combinators And finally, as of the combinatorials allowed allowing local server objects to be saved by the client is hard to implement eficiently without the server becoming useless, or creating a security hole;

    ..... At best, your centralized code will provide not only the primitives you need, but also the combinators necessary; but then, your centralized code is a computing environment by itself, so why need the original computing environment ? there is obviously a problem somewhere; if one of the two computing environment was good, the other wouldn't be needed !!!; All these are problems with servers as much as with libraries.


  15. Security
  16. Now that we saw how code reuse is possible, we are confronted with the main problem that arise from code reuse: unsecurity. Indeed, when you reuse some code that you did not write yourself (or that you wrote some long time ago), how can you be sure that this code works exactly as expected ? After all, perhaps there's some bug in the code, or worse, perhaps the code does work perfectly, but does not behave as it should to be a solution to the problem for which you use it. This problem is not specific to software written with reuse techniques; but what was a minor source of bugs without reuse becomes the one enemy of programmers now that reuse is possible, and that problems ever more complex thus can be (and are being) treated. So, the major problem now is being sure that some piece of code does fit the requirements for the program to work correctly. The first idea that arises is then "programming by contract", that is, every time some piece of code is called, it will first check all the assumptions made on the parameters, and when it returns, the caller will check that the result does fill all the requirements. This may seem simple, but implementing such technique is quite tricky: it means that checking the parameters and results is easy to do, and that you trust the checking code anyway; it also means that all the necessary information for proper checking is computed, which is loss of space, and that all kind of checking will take place, which is loss of time. The method is thus very costly, and what does it bring ? Well, the program will just detect failure and abort ! Sometimes aborting is ok, when you have time (and money) to call some maintenance service, but sometimes it is not: a plane, a train, a boat, or a spacecraft whose software fail will crash, collide, sink, explode, be lost, or whatever, and won't be able to wait for repairs before it's too late. And even when lives or billion dollars are not involved, any failure can be very costly, at least for the victim, who may be unable to work. That's why security is something important that any operating system should offer support for. Why integrate such support in the OS itself, and not on "higher layers" ? For the very same reasons that reuse had to be integrated to the OS: because else, you would have to use not the system, but a system built on top of it, with all the related problems, and you would have to rely on the double (or bigger multiple, in case of multiple intermediate layers) implementations, that each introduce unsecurity (perhaps even bugs), unadapted semantics, big loss in performance.


  17. Program proof
  18. .....



  • Next: III. No Computer is an Island, Entire in Itself

  • Previous: I. Operating Systems and Utility
  • Up: Table of Contents


    To Do on this page

  • Finish writing this part.
  • Divide into subfiles ?
  • wait for feedback from the members.
  • Put references to the Glossary...


    Faré -- rideau@clipper.ens.fr