December 95 - Getting Started With OpenDoc Storage
Getting Started With OpenDoc Storage
Vincent Lo
OpenDoc's structured storage model is an innovative departure from the
traditional storage scheme. As you make the move into OpenDoc development, you
need to understand the new storage model and its implications for the way data
is stored and retrieved. This article introduces the new concepts and policies
you'll need to know in order to use OpenDoc storage effectively.
In the traditional Macintosh user model, each application creates and maintains
its own documents, storing each document in a separate file. A file has one
creator signature and one file type, identifying the application it belongs to
and the kind of document it contains. In OpenDoc, by contrast, a document can
have multiple parts, created and maintained by different part editors (called
part handlers in earlier versions of OpenDoc), which are analogous to the
standalone applications of the traditional model. Because all of a document's
parts are stored together in the same container (usually corresponding to a
file), there has to be a way for separate part editors to share access to the
same container without interfering with each other.
OpenDoc meets this need by providing a structured model for persistent storage
(that is, for storing data from one session to the next). Each part is given
its own storage unit in which to store and retrieve data. The part can thus
operate as a standalone entity, independent of other parts and their storage.
OpenDoc maintains all of the storage units and notifies each part when to read
or write its data.
The same techniques that are used in dealing with persistent storage also apply
to the various forms of data interchange between part editors, such as the
Clipboard, drag and drop, and linking. Because all of these mechanisms use the
same data storage medium (the storage unit), they all work essentially the same
way from the part editor's point of view. For example, a part uses the same API
calls to copy data to the Clipboard that it would use in writing the data to a
file container. The same is true for drag and drop and for linking. Thus, once
you learn how to work with OpenDoc storage units for file storage, you can use
the same techniques to implement data interchange as well.
This article assumes that you're already familiar with basic OpenDoc concepts
and terminology. If you need a quick introduction or refresher, see the article
"The OpenDoc User Experience" in develop Issue 22. You can find additional
information on some of OpenDoc's technical basics in the articles "Building an
OpenDoc Part Handler" in Issue 19 and "Getting Started With OpenDoc Graphics"
in Issue 21. Developer releases of OpenDoc include the definitive
documentation, the OpenDoc Programmer's Guide and OpenDoc Class Reference.
Developer releases are available through a number of different sources, or you
can request the latest release at AppleLink OPENDOC or at
opendoc@applelink.apple.com on the Internet. The source code in this article is
excerpted from a sample part included with the developer release.
Because OpenDoc was developed jointly by a consortium of companies including
Apple, IBM, and Novell, its interfaces are designed for cross-platform
compatibility, using IBM's platform-independent Standard Object Model (SOM).
OpenDoc method definitions, including the ones in this article, are commonly
written in a language-neutral Interface Definition Language (IDL). The SOM
compiler converts these into equivalent language-specific declarations for
whatever source language you happen to be using. The method definitions shown
in this article, for instance, are taken from the OpenDoc interface file
StorageU.idl. To use these methods in your program, you must include the
corresponding language-specific binding file (such as StorageU.xh for a C++
program).
DRAFTS, DOCUMENTS, AND CONTAINERS
The OpenDoc classes responsible for providing storage capabilities are
ODContainer, ODDocument, ODDraft, and ODStorageUnit. Collectively, a set of
subclasses derived from these four is known as a
container suite. A
container
represents the physical storage medium in which a document is stored, such as a
disk file. Different container suites share the same API, but may use different
low-level storage mechanisms and operate on different physical storage media.
For example, the Bento container suite, which will be shipped with OpenDoc 1.0,
supports both file containers and in-memory containers. A part editor can thus
use the same code to store a part's data either to a file or in memory.
A single container may contain one or more documents, each of which in turn can
include one or more drafts. A part ordinarily works with a draft, rather than
directly with a document or its container. Each draft is a "snapshot"
representing the state of the document at a particular point in its
development. Together, the drafts embody the history of the document over time.
A part may need to interact with its draft for a variety of reasons:
- Persistent objects -- Every persistent object (such as a part, a frame, or
a link) is created by a draft.
- Data interchange -- A part asks its draft to copy transferred objects
to and from a data-interchange container, such as the Clipboard or a
drag-and-drop container.
- Linking -- A part uses its draft to create link specifications and copy
data to and from link objects.
- Permissions -- A part may need to find out whether it's allowed to
write to the draft.
- Scripting -- A part gets its scripting-specific identifier through its
draft.
STORAGE UNITS
The basic entity of a container suite is the storage unit. Every persistent
OpenDoc object has a storage unit in which to store and retrieve its data.
Figure 1 shows a typical example.
Figure 1. Structure of a storage unit
A storage unit consists of one or more properties, each of which in turn is
associated with one or more values containing the data itself. The storage unit
shown in Figure 1, for instance, has properties named kODPropContents,
kODPropPreferredKind, and kODPropDisplayFrames; the kODPropContents property
has values of types kTextEditorKind and kODMacIText.
Using multiple values allows a property to represent the same data in different
forms. For example, a property holding a drawing may have three values
representing the same data: one as a Macintosh PICT, one as a Windows metafile,
and one in TIFF format. Although OpenDoc cannot enforce the principle, part
developers are urged to use multiple values within a property only for multiple
representations of the same data, not for storing unrelated data items.
The property names and value types shown in Figure 1 represent string constants
of type ODPropertyName and ODValueType, respectively. For cross-platform
extensibility, both of these types are defined as equivalent to an ISO string
instead of a traditional Macintosh OSType: that is, they're 7-bit ASCII
null-terminated strings, as specified by the International Standards
Organization (ISO). The string values themselves are expected to follow a
standard naming convention: for instance, the constants kODPropDisplayFrames
and kODWeakStorageUnitRefs stand for the strings
"OpenDoc:Property:DisplayFrames" and "OpenDoc:Type:StorageUnitRefs",
respectively. The OpenDoc interface files StdProps.idl and StdTypes.idl define
name constants for standard properties and value types; any property and type
names that you define for yourself should follow the same naming conventions.
FOCUSING A STORAGE UNIT
The OpenDoc operations for manipulating values don't explicitly identify the
value to operate on. Instead, you have to focus the storage unit on the desired
property or value before invoking the operation. The method for setting the
focus is defined in class ODStorageUnit as follows:
ODStorageUnit Focus(in ODPropertyName propertyName,
in ODPositionCode propertyPosCode,
in ODValueType valueType,
in ODValueIndex valueIndex,
in ODPositionCode valuePosCode);
This
allows you to set the storage unit's focus in a variety of ways:
- to a property by name
- to a property by position relative to the current property
- to a value by type within a property
- to a value by position within a property
- to a value by position relative to the current value
Properties and
values are ordered within the storage unit according to the sequence in which
they were added. Values within a property are indexed from 1: that is, the
first value has index 1, the second index 2, and so on. Positions relative to
the current focus are specified with a position code. The same position code
can refer to either a property or a value, depending on the current focus. For
instance, if the storage unit is currently focused on a property, the position
code kODPosNextSib designates the next property; if the current focus is on a
value, kODPosNextSib designates the next value.
Another way to set the focus of a storage unit is with a storage unit cursor:
ODStorageUnit FocusWithCursor(in ODStorageUnitCursor cursor);
The
cursor identifies a property by name or a value by its property name and its
index or value type. Once created (with method CreateCursor or
CreateCursorWithFocus of class ODStorageUnit), the same cursor can be reused
multiple times to refer to properties or values within the storage unit.
Once you've focused a storage unit, you can create a storage unit view to refer
to the same property or value again later without having to reset the focus:
ODStorageUnitView CreateView();
The
view responds to all the same access methods as the storage unit itself, but
applies them to the property or value that had the focus at the time the view
was created, rather than at the time the method is invoked. It does this by
automatically resetting the underlying storage unit to the original focus, then
forwarding the method call to the storage unit for processing.
MANIPULATING VALUE DATA
The operations for manipulating data within a storage value are stream-based,
very much like reading or writing to a sequential file. Each value has a
current offset position that controls where the next operation will take place,
similar to the file mark in the Macintosh file system. In addition to reading
and writing data sequentially, you can also insert or delete data at the
current offset position.
Class ODStorageUnit defines the following methods for manipulating value data:
void SetOffset(in ODULong offset);
ODULong GetOffset();
void SetValue(in ODByteArray value);
ODULong GetValue(in ODULong length, out ODByteArray value);
void InsertValue(in ODByteArray value);
void DeleteValue(in ODULong length);
The
ODByteArray structure is used to pass data to or from a storage unit.
typdef struct {
unsigned long _maximum; /* size of buffer */
unsigned long _length; /* number of bytes of actual data */
octet* _buffer; /* pointer to buffer containing the */
/* data */
} _IDL_SEQUENCE_octet;
typedef _IDL_SEQUENCE_octet ODByteArray;
(An
octet is simply the SOM term for an 8-bit byte.) Listing 1 shows how to
manipulate one of the values shown in Figure 1.
Listing 1. Adding data to a value
/* Focus the storage unit, using property name and value type. */
storageUnit->Focus(ev, kODPropContents, kODPosUndefined,
kTextEditorKind, 0, kODPosUndefined);
/* Set up the byte array. */
ODByteArray ba;
ba._length = size;
ba._maximum = size;
ba._buffer = buffer;
/* Set the offset. (This step isn't really needed here, since the
Focus operation automatically sets the offset to 0. It's included
for illustrative purposes only.) */
storageUnit->SetOffset(ev, 0);
/* Add the value. */
storageUnit->SetValue(ev, &ba);
STORAGE UNIT REFERENCES
Storage unit references allow one storage unit to refer persistently to
another. A part can use this mechanism to access information stored in a
storage unit (which may or may not belong to it) across multiple sessions. A
draft thus consists essentially of a network of storage units connected to each
other with persistent references.
When a storage unit is cloned (copied to a data-interchange container), any
other storage units it references are cloned along with it. Since all storage
units in a draft are interconnected, cloning any one of them may cause the
whole draft to be cloned. Because this may be an expensive and unnecessary
operation, OpenDoc provides two levels of storage unit reference: strong and
weak. Only strongly referenced storage units are copied when the unit that
refers to them is cloned.
In Figure 2, frame A refers strongly to part A, which refers strongly to frame
B, which refers strongly to part B. Thus if frame A's storage unit is cloned,
all four storage units will be copied. On the other hand, cloning frame B's
storage unit will copy those for frame B and part B only, since frame B's
reference to frame A is weak rather than strong.
Figure 2. Strong and weak storage unit references
An object can use strong storage unit references to refer to other objects that
are essential to its functioning, such as embedded frames. Weak references are
mainly for informational or secondary purposes: a part might use them, for
instance, to refer to its display frames.
LIFE CYCLE OF A PART
Figure 3 shows the life cycle of a part and its associated storage unit.
Because the part's lifetime may span multiple editing sessions, it must be able
to
externalize its internal state (save it to persistent storage) in order to
reconstruct itself from one session to the next. The part's InitPart method,
called when the part is first created, receives a storage unit as a parameter.
The Externalize method can then use this storage unit to save the part's state.
Once externalized, the part can be released from memory and later reconstituted
from external storage by a method named InitPartFromStorage. Unlike InitPart,
InitPartFromStorage can be called multiple times during a part's lifetime,
whenever the part needs to be reconstructed from external storage.
Figure 3. Life cycle of a part
Notice that externalizing a part is not the same as cloning it. Externalizing
means writing the part's data to persistent storage, using a storage unit
associated with the draft in which the part resides; cloning is transferring
the part's data to a data-interchange container such as the Clipboard, using a
storage unit associated with the container. Although the two operations are
different, they're both based on the same ODStorageUnit API and can share much
of the same code.
Another related operation is purging, which reclaims memory space by
eliminating unnecessary runtime data structures such as caches. Because such
structures can usually be reconstructed from persistent data, many OpenDoc
programmers believe that a part's Purge method should always begin by
externalizing the part's data before deleting unused or unnecessary memory.
While this might sound plausible in principle, the externalization operation
itself requires additional memory -- the very thing that's in short supply
during purging. As a general rule, the Purge method should avoid invoking
externalization unless it's absolutely necessary.
All persistent objects carry a reference count, enabling OpenDoc to identify
unused objects and reclaim the memory they occupy. The Acquire method, which
creates a reference to a specified object, increments the object's reference
count; the Release method destroys a reference and decrements the reference
count. When the reference count goes down to 0, OpenDoc can safely delete the
object from memory.
INITIALIZATION
The initialization method InitPart is called only once, to set up a part's
initial state. It should take the following actions:
1. Call the parent class's InitPart method to perform any initialization
required at the parent level.
2. Save the incoming part wrapper object (discussed below) in an internal
field.
3. Set up an internal permissions field to indicate that writing to the
draft is allowed.
4. Set up the part's runtime data structures.
5. Set the part's internal dirty flag to true.
Listing 2 shows an
example. Notice that the SOM compiler, in translating the method declaration
from language-independent IDL into a specific source language, adds two
additional parameters at the beginning of the parameter list: a pointer to the
object executing the method (somSelf) and an environment pointer (ev) used for
error reporting. All of our example method definitions in this article begin
with these two parameters.
Listing 2. Initializing a part
SOM_Scope void
SOMLINK TextEditor__InitPart(SampleCode_TextEditor *somSelf,
Environment *ev,
ODStorageUnit *storageUnit,
ODPart *partWrapper)
{
SampleCode_TextEditorData *somThis =
SampleCode_TextEditorGetData(somSelf);
SOMMethodDebug("TextEditor", "InitPart");
SOM_TRY
// Call the parent class's InitPart method. The parent will in
// turn call its parent, and so on.
parent_InitPart(somSelf, ev, storageUnit, partWrapper);
// Store part wrapper object in an internal field.
_fSelf = partWrapper;
// Set a flag showing that this draft is not read-only.
_fReadOnlyStorage = kODFalse;
// Call common initialization code to set up our initial state.
somSelf->Initialize(ev);
// Set the dirty flag to true.
somSelf->SetDirty(ev);
SOM_CATCH_ALL
// No explicit code needed here: cleanup will be performed by
// the destructor, which is called automatically when an error
// is thrown.
SOM_ENDTRY
}
Parent initialization. It's important for a part's initialization method to
call that of its parent class. The parent's initialization method will in turn
call that of its parent and so on up the inheritance chain, ensuring that all
of the part's inherited properties are properly initialized. Inherited
properties set up by ODPart and its parents, such as ODPersistentObject,
include the following:
- kODPropCreateDate contains the part's creation date.
- kODPropModDate tells when the part's storage unit was last
externalized.
- kODPropModUser contains the name of the last user who modified the
part.
Part wrapper. Every part is wrapped by another object, called its part wrapper.
Clients of the part object deal with it indirectly, through the part wrapper,
instead of holding a direct pointer to the part object itself. The part wrapper
receives all method invocations and delegates them to the actual part. This
insulation of the part object allows the part editor to be changed at run time
without affecting its clients.
The InitPart method should save the part wrapper object in an internal field.
Then, whenever the part needs to pass an object representing itself as a
parameter, it should pass the part wrapper in place of itself.
Draft permissions. A part editor needs to know whether a part is in a read-only
draft. If so, its functionality may be restricted: for example, the part may
not allow the user to change its contents, either through keyboard input or
through menu operations such as Cut and Paste. Also, if the draft is read-only,
its Externalize method need never be called on its parts or any persistent
objects. When a part is created for the first time, its draft is guaranteed to
be writable, so it should initialize its read-only flag to false.
Dirty flag. The purpose of a dirty flag is to let the part's Externalize method
know whether it needs to write out the part's state to external storage.
Externalization (especially to disk) can be a time-consuming and expensive
operation; using a dirty flag can greatly improve performance by avoiding it
whenever possible.
When a part is first created, its storage unit is empty. Since the state has
not yet been written out, the part should initialize its dirty flag to true;
the flag should also be set to true whenever the contents of the part are
changed. After saving the state and content data to external storage, the
Externalize method should clear the flag to false, indicating that the state
need not be saved again unless the part's contents are changed.
EXTERNALIZATION
A part's Externalize method can be called at any time. Typically, it's called
by the draft when the user chooses to save the document. Since a part has no
idea when this may happen, it should always be ready to externalize itself.
The Externalize method should do the following:
1. Call the parent class's Externalize method.
2. Check that all required properties exist; if not, add them to the
storage unit.
3. Clean up the part's contents if necessary.
4. Write out the part's state information and contents.
5. Clear the part's internal dirty flag to false.
Listing 3 shows an
example.
Listing 3. Externalizing a part
SOM_Scope void
SOMLINK TextEditor__Externalize(SampleCode_TextEditor *somSelf,
Environment *ev)
{
SampleCode_TextEditorData *somThis =
SampleCode_TextEditorGetData(somSelf);
SOMMethodDebug("TextEditor", "Externalize");
SOM_CATCH return;
// Ask parent classes to externalize themselves.
parent_Externalize(somSelf, ev);
// Check dirty flag.
if (_fDirty) {
// Get storage unit.
ODStorageUnit *storageUnit = somSelf->GetStorageUnit(ev);
// Verify that the storage unit has the appropriate properties;
// if not, add them.
somSelf->CheckAndAddProperties(ev, storageUnit);
// Validate storage unit's contents and clean up if necessary.
somSelf->CleanseContentProperty(ev, storageUnit);
// Write out state information and contents.
somSelf->ExternalizeStateInfo(ev, storageUnit, 0, kODNULL);
somSelf->ExternalizeContent(ev, storageUnit, kODNULL);
// Clear dirty flag.
_fDirty = kODFalse;
}
}
The
contents of a part must be written out to a special content property named
kODPropContents. Like other properties, the content property can contain
multiple values representing the same data in different forms. A value type
used for content data is referred to as a part kind. To facilitate data
interchange, part editors are encouraged to include one or more standard part
kinds in their content property, much the way traditional Macintosh
applications use common data formats like 'TEXT' or 'PICT' when writing to the
Clipboard.
Each value in the content property should be a complete representation of the
content data. A value may contain references to other storage units, but cannot
depend on other values in the content property or on other properties in the
part's storage unit. Even if every other property and value were deleted from
the storage unit, the part editor should still be able to reconstruct the part
using just that one content value.
The ordering of values within the content property is completely determined by
the part editor. An important principle, however, is that values that represent
the underlying contents with greater fidelity should precede those of lesser
fidelity: formatted text, for instance, should precede plain (unformatted)
text. The first value should be the one that represents the content most
faithfully.
When a part editor reconstructs a part from an external storage unit, there's a
chance that the storage unit may have originally been written by some other
part editor. As a result, the content property may contain part kinds that the
current part editor doesn't support, or the values may appear in the wrong
fidelity order. In this case, the part's Externalize method should remove all
existing values from the content property so that it can write out its own
content data in proper fidelity order.*
A standard property named kODPropPreferredKind identifies the part kind that
the user chooses to represent the data. If this property already exists, the
part editor shouldn't tamper with it; if it doesn't exist, the part editor may
create it and give it a value of type kODISOStr containing the name of the
highest-fidelity part kind. When writing out the content data, the part editor
should be sure to include a value in the format specified by this property.
RECONSTRUCTION
The InitPartFromStorage method is called whenever a part object needs to be
reconstructed from external storage. This method should do the following:
1. Call the parent class's InitPartFromStorage method.
2. Save the incoming part wrapper object in an internal field.
3. Set up an internal permissions field to indicate whether writing to the
draft is allowed.
4. Set up the part's runtime data structures.
5. Read the content data from the storage unit into the runtime data
structures.
6. Clear the part's internal dirty flag to false.
Notice that these
are essentially the same steps we listed earlier for the InitPart method,
except that the contents of the part's runtime data structures are read in from
the storage unit instead of being initialized to standard values, and that the
dirty flag is cleared to false instead of true to show that the part's contents
agree with those in the external storage unit. Listing 4 shows an example of an
InitPartFromStorage method.
Listing 4. Reconstructing a part
SOM_Scope void
SOMLINK TextEditor__InitPartFromStorage
(SampleCode_TextEditor *somSelf,
Environment *ev,
ODStorageUnit *storageUnit,
ODPart *partWrapper)
{
SampleCode_TextEditorData *somThis =
SampleCode_TextEditorGetData(somSelf);
SOMMethodDebug("TextEditor", "InitPartFromStorage");
// Avoid initializing the part twice.
if (fSelf != kODNULL)
return;
SOM_TRY
// Call the parent class's InitPartFromStorage method. The
// parent will in turn call its parent, and so on.
parent_InitPartFromStorage(somSelf, ev, storageUnit,
partWrapper);
// Store part wrapper object in an internal field.
_fSelf = partWrapper;
// Set a flag showing whether this draft is read-only.
_fReadOnlyStorage = (storageUnit->GetDraft(ev)->
GetPermissions(ev) == kDPReadOnly);
// Call common initialization code to set up our initial state.
somSelf->Initialize(ev);
// Read in state data from external storage.
somSelf->InternalizeStateInfo(ev, storageUnit);
// Read in content data from external storage.
somSelf->InternalizeContent(ev, storageUnit);
SOM_CATCH_ALL
// No explicit code needed here: cleanup will be performed by
// the destructor, which is called automatically when an error
// is thrown.
SOM_ENDTRY
}
As we've already noted, the storage unit from which a part is reconstructed may
have been created by a part editor other than the one reading it in. The
OpenDoc binding subsystem uses the part kinds found in the storage unit's
content property to determine which part editor to invoke. If the original part
editor cannot be found, the binding subsystem will look for another editor
capable of reading the available part kinds. The contents of the storage unit
may thus be very different from what the current part editor expects. Here are
a few points to note:
- If the storage unit identifies a preferred part kind (that is, if it
contains the property kODPropPreferredKind), the part editor should read its
content data from the indicated value of the content property. If no preferred
kind is specified (or if the part editor cannot handle a value of the specified
kind), it should iterate through the available values looking for one it can
handle. When it finds such a value, it should read the content data from that
value into its runtime data structures.
- The InitPartFromStorage method should not add its own properties to the
part's storage unit, but should leave that task to the Externalize method
instead. This is because the user may close the document without modifying any
of its contents. If the InitPartFromStorage method modifies the storage unit,
the user will be prompted to save the document before closing it, even though
the document has not been modified.
- The part editor should not alter the part's preferred-kind property
(kODPropPreferredKind).
WHAT NEXT?
Needless to say, the only real way to get familiar with OpenDoc programming is
to jump in and develop a part editor of your own. The techniques discussed in
this article will help you manage your storage needs effectively. The rest is
up to you and your imagination.
RELATED READING
- "The OpenDoc User Experience" by Dave Curbow and Elizabeth
Dykstra-Erickson, develop Issue 22.
- "Getting Started With OpenDoc Graphics" by Kurt Piersol, develop Issue
21.
- "Building an OpenDoc Part Handler" by Kurt Piersol, develop Issue 19.
- OpenDoc Programmer's Guide and OpenDoc Class Reference, available as
part of the OpenDoc developer releases.
- The latest news on OpenDoc can be found on the World Wide Web at
http://www.info.apple.com/opendoc or http://www.cilabs.org.
VINCENT LO is Apple's technical lead for OpenDoc. When he isn't removing
"unwanted features" or participating in design meetings, he divides his time
equally among roller hockey, ice hockey, and explaining to his friends why he
plays so much hockey. He has also been known to apply his body checking
techniques in intense engineering discussions.
Thanks to our technical reviewers Dave Bice, Craig Carper, Ed Lai, and Steve
Smith.