November 93 - Prototype-based OOLs
Prototype-based OOLs
Mikel Evins
When we think of object-oriented languages we
usually think of languages that support data
abstraction by providing data templates called classes. A class acts as a description of a type of program data that we use and reuse by creating instances of it. When we want to define a new type of data we can reuse previous design work by creating subclasses, defining new representations in terms of what they add to old ones.
Classes also help organize the dynamic behavior of our programs, as most or all of the routines we write are associated with classes. Whether we use the Smalltalk model of objects that respond to messages or the Common Lisp Object System (CLOS) model of functions that specialize on classes and instances, the behavior of the program is determined by the interaction of a routine called in the context of a class (or perhaps a group of classes) that defines the behavior of the routine.
When we are accustomed to thinking of object-oriented languages in such terms it may seem peculiar to talk about objects without classes, but that's the subject of this article. It turns out that not all object-oriented programming languages use classes as a primary organizing principle; indeed, several object-oriented languages support no notion of classes at all. Specifically, we will discuss object-oriented programming languages that use concepts called prototypes and delegation to organize data and dynamic behavior; we will call such languages prototype-based languages. (There is at least one other way to organize object-oriented programs that involves neither classes nor prototypes; the programming language BETA uses a concept called a pattern).
Class-based languages, that is, object-oriented programming languages whose type systems are based on classes, are much more common and more familiar to most programmers than prototype-based languages. Nevertheless, there are several interesting languages based on prototypes rather than on classes, and at least one, NewtonScript, is important to working programmers interested in shipping commercial applications. Others, such as Self, are exerting influence on the language design community through their embodiment of innovative compilation techniques and novel ways of organizing data and behavior. At least one recently-designed class-based language, Dylan, shows the influence of prototype-based languages.
For the purposes of this article we will define a class-based language as one in which data types are defined by classes , which are special constructs distinguished from ordinary data objects by their role as abstract type descriptions, and in which the behavior of a function or method is determined by the class with which it is associated. Examples of class-based languages include C++, Smalltalk, CLOS, Objective-C, and Dylan.
A prototype-based language is a language which lacks distinguished classes, in which any runtime object can serve as a template for the creation of another object, and in which the behavior of a function or method is determined by the search path among objects used to find it. Examples of prototype-based languages include Self, ObjectLisp, NewtonScript, T, and Cecil.
Advantages of prototypes
Classes are abstract descriptions of data types. Class-based languages create new program objects by instantiation; that is, they create new memory objects whose structure and behavior is determined by the description embodied in the class. Classes may be objects themselves, that is they may exist at runtime and occupy memory, or they may simply be compile-time abstractions that describe for the compiler how to lay out memory when creating an object.
In a prototype-based language the role of classes is occupied by objects called prototypes. A prototype is an object that is used as a model for the creation of another object. The behavior of a function or method is determined by looking up the named routine in a specified object or in other objects called its delegates. Although this description may sound very much like the description of classes and inheritance there are important differences.
First, as already mentioned, a class may or may not exist at runtime. Classes are descriptions of abstract data types, and the creation of a new object of that type does not require that the class actually exist as a data object itself. A prototype, on the other hand, is definitely a data object that exists in the program's address space. A new object is created by copying the prototype in whole or in part.
Second, classes, when they are objects at all, are a special kind of object whose primary purpose is to serve as a type description. A prototype, on the other hand, is just an object like any other object. A prototype of a window object is itself a window object and it has all the data and behavioral features of any of the objects that it is used to create.
Third, unlike methods in class-based languages, those in prototype-based languages need not be associated with any particular set of objects. In a class-based language a method in some sense belongs to the set of objects that are defined by the class. In a prototype-based language a method simply resides in a context, usually in a slot or field of an object. The particular behavior that results from a message-send or a function-call depends upon the context in which the function or method is looked up.
Advocates of prototype-based languages claim several advantages for using prototypes instead of classes. For one thing the use of prototypes eliminates the potentially complicated meta-object problem of defining the class of a class. If a class is an object, then of what class is it an instance?
Class-based languages use several different approaches to solve this problem. In C++ classes are simply compile-time conventions, and so classes have no semantic function at all in a running program. The meta-object problem is solved by omitting it. In Smalltalk it is solved by providing a set of meta-classes: each object is the instance of a class and each class is an instance of a meta-class. The meta-classes are subclasses of the class Class. Each meta-class defines the behavior of its instances, which are classes, and the classes in turn define the behavior of their instances. You might be wondering what class is used to instantiate the class Class; Class is usually defined circularly, being (either literally or in effect) an instance of itself. The object model of CLOS is somewhat more complicated still, involving a suite of class-oriented functions and types called a Meta-object protocol. A reasonable description of the CLOS meta-object protocol is beyond the scope of this article (but if you are interested in learning more about it see The Art of the Metaobject Protocol by Gregor Kiczales, Jim des Rivieres, and Daniel G. Bobrow, MIT Press, 1991).
In a prototype-based language the meta-object problem does not exist because classes do not exist. Objects are created by copying other objects (this is not technically true in all prototype-based languages, but the principles are essentially the same even when the technical details are not). There is no need to concern yourself about the semantics of classes because there aren't any.
Another advantage claimed by advocates of prototype-based languages is superior flexibility. One might say that among object-oriented dynamic languages prototype-based languages are the most dynamic. In a program written in a class-based language an object is a member of a class; either it was created according to a compile-time template or by executing code associated with a runtime class object, but either way it is assumed to belong to a set of objects whose characteristics are defined by a class. Prototype-based programs instead consist of objects whose relationships are all defined by runtime pointers, and which may or may not be organized into sets comparable to those defined by classes. An object's prototype can be changed at runtime, as can its memory layout.
In all fairness, these differences are not so hard and fast as they may seem. For one thing, most prototype-based programs are, in practice, organized around sets of objects that might as well be classes. For another, some class-based languages, notably CLOS, provide facilities with which instances of classes can be modified much in the same way as objects in a prototype-based system. Nevertheless, there are differences of degree between class-based and prototype-based languages, enough that we can think of a continuum of flexibility: C++, which provides essentially no runtime support for modifying an object's layout or class relationships might form the static end of the spectrum, and a language like Self, in which each object's layout and prototype relationships can be modified at any time, might form the dynamic end.
Prototypes and delegation
The basic organizing principles of a prototype-based language are prototypes and delegation.. Prototypes form the basis of the data-description model in a prototype-based language and delegation is the mechanism by which code is dynamically selected for execution.
A prototype is any object that is used as the model for another, newly-created, object. A window system implemented in a prototype-based language, for example, might provide a prototypical window object for use as the model of all windows. New window objects could be created by cloning the prototype. The new windows would be created with copies of the prototype's instance variables, except for those variables that the creating function specifically overrides. A new window would most likely store a pointer to its prototype, enabling runtime code to dynamically determine a window's prototype and enabling the system to use optimizations such as copy-on-write discipline for the slot or field values of the new window.
Delegation is the process by which a function or method, referred to in the context of one object, is found in another object. When a method is referred to by code defined for an object but is not itself found to be defined in the object the language runtime searches other objects, called delegates. Depending on the object model of the individual language the delegates may defined when the object is created or may be specified by the method-call syntax. The analogous process in a class-based language is dispatching to an inherited method, and an object's delegates perform the role of superclasses in a class-based language. Of course a well-designed compiler for a class-based language may eliminate runtime dispatching in many cases, and, in a similar way, a compiler for a prototype-based language may be able to statically identify the code to be executed by a function or method call.
Dynamism
You might have noticed that prototypes are simply data values, references to which are presumably stored in the objects cloned from them (commonly called their 'children'), and the delegates are also just references to objects. You might assume that you could change these relationships with a simple assignment, and you would be right. It is the ease with which you can make such changes that makes advocates of prototype-based languages enthusiastic about their flexibility. Of course, if you make ill-planned changes or changes that your objects are not designed to support then the behavior of your program will be unpredictable; this is one of the tradeoffs of the runtime flexibility of such languages.
Another aspect of the dynamism of prototype-based languages is the ease with which you can change the structure of data objects at runtime. Prototype-based languages treat objects as collections of labeled values; we will adopt the practice common to Self, NewtonScript, and frame languages generally and refer to these labeled values as slots. Most prototype-based languages support the ability to add to, remove from, and change and object's slots at runtime. This ability presents the programmer with a degree of flexibility well beyond what is supported in most other types of languages, even other types of object-oriented dynamic languages. The ability to dynamically add and remove slots is indispensable in frame languages, which are widely used to represent human knowledge in knowledge-based systems. It's also important in the flexibility of the development model of NewtonScript, the application-development language for Apple Computer's new Newton line of Personal Digital Assistants.
Languages that use prototypes
Prototype-based languages are not really new; frame languages, for example, have existed in the AI community since the late 1970s. It is only fairly recently, however, that language designers have experimented extensively with general-purpose programming languages based on prototypes and delegation. One of the most important of such languages is Self, developed by David Ungar and Craig Chambers at Stanford University (see "SELF: The Power of Simplicity" in OOPSLA '87 Conference Proceedings, pp. 227-241, 1987). SELF's design is strongly influenced by Smalltalk and uses a message-passing model of execution. It entirely eliminates classes, however, replacing them with prototypes and delegation. Each object has two parents: a data parent which serves as the prototype from which the object is constructed by copying its structure, and the behavior parent, which serves as a delegate in which methods not locally defined are looked up.
An earlier programming language that embodied some early versions of the concepts used in prototype-based languages was T (see The T Programming Language by Stephen Slade, Prentice-Hall 1987). Developed at Yale university by Norman Adams, Kent Pitman, and Jonathan Rees, T was a dialect of Scheme that extended that language in several ways, not the least of which was a facility for procedurally defining objects that encapsulated local state and that responded to messages. These objects could be joined to produce new objects in a process similar to cloning, but combining the features of two or more objects in a new one.
There have been numerous frame languages, programming languages designed to represent human knowledge in objects called frames. Such languages are essentially prototype-based object-oriented languages, but designed particularly to facilitate describing human knowledge, and often relatively weak in supporting functional abstraction (they also, however, usually offer a facility that enables the programmer to execute code when a slot is accessed, a feature that has made its way into several more recently introduced general-purpose languages). They are often designed to be embedded as an extension to another programming language, most often some form of LISP. The idea for frame languages was originally proposed by Marvin Minksy in a 1975 paper entitled "A Framework for Representing Human Knowledge" (Psychology of Computer Vision, ed. Patrick H. Winston, MIT Press, 1975).
Cecil is a new prototype-based language designed by Craig Chambers, one of the designers of Self. Cecil somewhat resembles Self, though with a syntax less reminiscent of Smalltalk, and with new language features that add support for multimethods and optional static type checking. It is described in "The Cecil Language, Specification and Rationale" by Craig Chambers, University of Washington Department of Computer Science and Engineering Technical Report 93-03-05.
NewtonScript is a new prototype-based programming language that is of great interest to many programmers because it is the application development language for Apple's new Newton line of PDAs. NewtonScript's design owes a great deal to both LISP and Self, but the language has many idiosyncratic features, including a unique syntax that resembles C or Pascal more than either of its immediate ancestors. All published APIs for the Newton platform are presently provided in terms of NewtonScript, and so prospective Newton developers may be the first large community of commercial programmers whose only development language is a prototype-based OODL. NewtonScript is documented in The NewtonScript Programming Language, a publication of the Apple PIE Technical Publications Group, and available from Apple as part of the standard Newton development tools.
Mixed models
As is the case with other object-oriented models, prototype-based programming can be based either on a purist's approach, in which every aspect of the programming language and environment uses the prototype-based model, or on a mixed programming model in which more conventional techniques are supplemented with prototype-based features. NewtonScript makes a reasonable example of a prototype-based language that uses a mixed model, and so we'll consider it in more detail.
NewtonScript is an interpreted object-oriented dynamic language based upon prototypes and delegation, but with some static-language features, and even some features of a class-based language (features derived from C++). NewtonScript syntax was designed intentionally to be readable to experienced C and Pascal programmers. It is statement-oriented, uses the semicolon as a statement separator, represents assignment as the token ':=', and represents slot access with a dot notation like the C structure-access notation. NewtonScript programs thus have a familiar look to them; most experienced programmers will have no trouble understanding the following NewtonScript code fragments:Ê
for x := 1 to 10 by 2 do
print(x);
x := 4;
loop if x = 0 then break
else
begin
print(x);
x := x - 1
end
There are two fundamental data types in NewtonScript: immediate objects and pointer objects. Immediates can be integers, characters, or Booleans. Pointer objects may be symbols, strings, real numbers, arrays, or frames. Most of these data types are familiar: integers and reals are, of course, whole numbers and decimal numbers, respectively. Characters, Booleans, and strings are much the same as in C, but the language provides an escape notation for representing Unicode characters.
Symbols are names that can be used as data in a program, much like symbol objects in LISP or Smalltalk. Each symbol object is unique, and there is only one symbol in the system with a given name. Symbol names are case-insensitive, but NewtonScript remembers the capitalization that you use the first time to type a symbol's name.
Arrays are sequences of zero or more references to NewtonScript objects, indexed by integers starting with zero. Array objects can be either homogeneous or heterogeneous; that is, an array can store objects all of a single type or it can store references to a mixed collection of different types. NewtonScript provides syntax to distinguish the two cases.
The following code fragment stores in x an Array of strings:
x := [String: "foo", "bar", "baz"]
This fragment stores a heterogeneous collection of objects (a string, a symbol, and an integer):
x := ["foo", 'date.month, 12]
Frames are the objects that most distinguish NewtonScript as a programming language, and are the central organizing element of a NewtonScript program. NewtonScript frames are roughly similar to the objects in frame languages used for knowledge representation, but with somewhat more limited features. A frame is a collection of named values called slots, just as in a frame language. Each slot can contain a NewtonScript object or a function, which is a piece of executable NewtonScript code (functions are also called scripts).
The following code fragment stores in x a newly created frame with several arbitrary slots:
x := {name:"Jim", phone:"234-5678",
manager: someManagerVariable}
You can access the various slot values using NewtonScript's dot notation:
x.name
x.manager.phone
Slots can also store methods or scripts, and we can call them in a way much like the way we access data values:
x.printMyName()
NewtonScript mixes the prototype and delegate relationships, defining two different inheritance paths and looking up both slot values and method calls along both paths. One inheritance path is defined by protos. A proto is simply a prototype from which a given object was created. Calling the proto() script for an object (accessed using dot notation, as above) returns its prototype.
NewtonScript is closely bound to the Newton View system, and there is a notion of containment that encompasses most of the frames that a developer commonly uses. It is this concept of containment that provides the second inheritance path along which values and methods are looked up. Each view frame ( the most common kind of frame in most NewtonScript programs) has a parent, that is, another view frame that contains it. When you access a frame slot, whether to return a data value or to call a method, NewtonScript looks along inheritance paths leading both to the frame's parents and to its protos to find the referenced slot. NewtonScript first looks in the frame for the referenced slot, then in its proto, and then in its proto's proto, and so on. Once that avenue is exhausted, NewtonScript looks in the frame's parent, then the parent's proto, then that proto's proto, and so on. The chain of lookups next continues with the parent's parent, and so on until there are no more frames to examine.
You'll notice that the value returned (or method called) in this scheme depends upon the value stored in the first slot that NewtonScript finds with the correct name in this lookup scheme. If you insert a slot in the lookup order, or store a new value on one of the looked-up slots, or temporarily change one of the parent or proto slots to point to a different frame, then you can change the value or script that NewtonScript finds. This flexibility makes it easy to patch or customize software built using NewtonScript, and illustrates one of the most appealing features of prototype-based languages.
Roll your own prototypes
As always, the flexibility of a prototype-based language costs something in space and speed. Prototype-based languages require that mutable runtime relationships exist between objects and their prototypes. Although this fact does not necessarily mean that there must be one data object for each prototype reference, it does mean that simply eliminating object templates by making them compile-time abstractions is not an option. The most general form of delegation, in which an object can dynamically decide to which other object it will delegate method calls also requires runtime support, and such support costs time and space in the language runtime. Although the work on Self has shown that prototype-based languages can be fast and the work on NewtonScript has shown that they can be small, the question of how to make them both small and fast or how to arrange that programmers can trade off speed and space constructively on a per-project basis is still an open research problem.
Nevertheless, prototype-based systems offer some attractive advantages, most notably the high degree of runtime flexibility offered by a system in which structure, behavior, and data organization can all be modified at will. The ability to define standard, prototypical data objects but also to modify their structure at runtime to suit a user's needs was a primary motivating factor in the design and implementation of the NewtonScript prototype-based object system. As programmers and users grow accustomed to the flexibility of such a design they may well come to expect similar flexibility in desktop products as well as PDA-class machines. As you adapt your implementation plans to reflect these trends you may find that your designs are simplified by using some features of prototype-based languages.
It's hard to find prototype-based languages for production use right now; most of the existing systems are experimental or special-purpose languages. NewtonScript is an exception, but is available only on the Newton platform. In the absence of a widely available, general-purpose prototype-based programming language you may want to consider implementing an embedded language or a prototype-based library.
A reasonable approach to providing prototype-based features in a language that lacks them might be to implement an abstract class that supports the desired features. We assume here that you are working in an object-oriented language; you could implement these features in a non-object-oriented language using records, structures, tables, parallel arrays, or whatever mechanism is most reasonable in the language.
In order to support prototype relationships you need to provide a mapping between names and values. In dynamically-typed languages like LISP and Smalltalk your mapping can store tables of strings or symbols that map to arbitrary data values; each such mapping corresponds to a slot in a prototype-based language. In a statically-typed language like C++ or Object Pascal you'll need to either define what type of value a slot can hold or else implement a union or reference type for values if you want slots to be able to store heterogeneous data.
You'll have to decide on initialization policies for newly-created objects. For instance, when a new object is created from a prototype your code can clone the prototype, storing any initial values into the appropriate slots, and copying the values from the prototype for any slots not specified. Alternatively, if a slot's value isn't specified you can defer its allocation, looking it up in the prototype if your code refers to it, and allocating it in the new object if your code writes to it. When the new object is allocated you'll most likely want to store a reference to the prototype so that you can look up inherited slot values and support reflective operations. You'll also need to decide whether to allow users of your class to add and remove slots dynamically. If not then you can optimize instance variable lookups because you know exactly how a given object's children are laid out, but you also give up one of the major advantages of prototype-based languages.
In order to support delegation you'll need to decide how an object defers a method call to another object. You might opt to store a reference to a behavior parent, as in Self: if your code calls a method that is not defined on a given object then it passes the call to the object stored in its behavior parent slot. Alternatively, you could support a scheme in which your method-calling mechanism allows you to specify an alternative object in which to look up the method if it's not found in the original object.
Many frame-languages extend the dynamic functionality of prototype-based languages by representing slots themselves as objects that have slots. For example, a frame slot might have a slot on it that stores a reference to a setter daemon, that is, a method that gets called any time you store a value in the slot. Besides setters, slots might have getter daemons (called when the slot value is fetched),
constraint daemons (called to determine whether the slot value satisfies some constraint, such as a range or type requirement), and so on. Another use for slots on slots is to provide dynamically configurable inheritance. A slot might have slots on it that describe how to look up values or methods when they are not present locally.
As always, it's certainly more work to implement the features of a prototype-based language than to simply use one in which they are already implemented. On the other hand, production prototype-based languages are relatively hard to come by at present. Besides that, implementing some of their features (or just thinking about it) can be educational. It can give you an appreciation of just how flexible such a language can be, and exactly what those features can cost (not to mention how clever the compiler writers are who make languages like Self run fast).
Conclusion
NewtonScript notwithstanding, it may be some time before prototype-based languages are widely used in commercial software development. Nevertheless, the ideas that they embody are appealing, especially as we try to direct software evolution toward increasingly flexible, configurable, and upgradeable systems. The high degree of dynamic mutability of prototype-based systems makes them unusually well-suited to incremental changes and upgrades. Users of such languages can add new features directly to a running system so long as they don't incompatibly change previously defined features.
Although prototype-based languages are relatively rare outside of research laboratories, the literature mentioned above provides a good introduction to the concepts, and the strong initial sales of the Newton Tool Kit suggest that a considerable number of commercial developers will soon be getting their feet wet in prototype-based programming. If NewtonScript concepts prove popular among application developers then we may be well advised to learn more about prototype-based languages in general.