NeoAccess 5.0 Revealed
Volume Number: 14 (1998)
Issue Number: 4
Column Tag: Tools Of The Trade
NeoAccess 5.0 Revealed
by Michael Marchetti
Taking a look at this high-performance object database management system
Introduction
As a developer of custom software, I have the opportunity to work on a variety of projects, from the design phase all the way through the maintenance cycle. In 1989, ITA designed and implemented a program which stored and retrieved a number of different kinds of data. Its data storage mechanism consisted of MacApp's dynamic arrays, stored in a simple custom file format. Over the last eight years our requirements changed in a number of ways. The sheer volume of data increased greatly, resulting in increased memory usage, longer search times, and longer program startup times. New features required us to implement new access mechanisms, adding complexity to our original design. Finally, we were asked to port the program to Windows. Our Mac-centric data storage used Pascal strings and assumed big-endian integer and floating-point representations. We were looking at a major overhaul. After evaluating a number of storage mechanisms, we chose NeoAccess.
Why NeoAccess?
At first, the idea of using a full-featured object database management system seemed like overkill. As we evaluated NeoAccess, however, its advantages became apparent.
NeoAccess does not load all of a database's data at once. Instead, objects are brought into memory on demand and cached in a bounded amount of memory. A reference-counting mechanism allows NeoAccess to effectively manage the cache with minimal developer involvement. This provides developers with a number of benefits.
- Memory requirements: NeoAccess can store large amounts of data using a bounded amount of memory as an object cache. Objects are reference-counted, allowing NeoAccess to effectively manage the cache with minimal developer involvement.
- Startup time: Documents open faster because not all the data is loaded at once.
- Flexibility: NeoAccess allows the application developer to easily change the database format and still support reading, converting, and even creating databases using older formats.
- Access mechanisms: NeoAccess supports relational queries as well as referential (object network) access. Database developers can choose the most appropriate method for each situation.
- Cross-platform compatibility: NeoAccess is available for Macintosh, Windows, and Unix, with a uniform feature set across all supported platforms. Database files are written in a canonical form that can be read on any platform.
- Object-oriented design: NeoAccess is written entirely in object-oriented C++. Databases store and retrieve true C++ objects.
- Framework support: NeoAccess application, document, stream, and persistent object classes are integrated into the frameworks available for each platform. NeoAccess also includes a "standalone mode" for use with custom frameworks or without any framework.
- Multi-threading: NeoAccess can be safely used in a multi-threaded environment and takes advantage of asynchronous I/O operations to improve throughput.
- Capacity: NeoAccess can store up to 4 billion objects per database. The Mac OS limits database files to 4GB; NeoAccess can be configured to use 63-bit file offsets on platforms that support larger files.
First Impressions
The NeoAccess 5.0 package I reviewed is for Macintosh only (the multi-platform version has identical core code but includes source and project files specific to the other platforms). The package consists of one CD and two manuals. The installation procedure involves simply copying the contents of the CD to a hard drive. The CD includes:
- Full source code to the database engine.
- The complete manual in PDF form.
- Several add-on features (called "Extras"), with source code and PDF documentation.
- Two demo programs with source code and CodeWarrior projects.
The two demo programs come pre-built in a number of configurations, with SYM files. This means you can use the debugger to learn how a NeoAccess program and the NeoAccess engine work. The demos come in different flavors for different frameworks. Version 5.0 supports standalone (no framework) and PowerPlant. This is a change from version 4.0, which included support for MacApp 3.3 and TCL.
The demo programs are NeoBench and Laughs. NeoBench is a benchmarking program to test the speed of the database engine. Laughs illustrates most of the core constructs in NeoAccess, including inheritance, part managers, blobs, strings, and indices. The documentation includes a tutorial which describes the classes used in the Laughs application and how they interact.
Technical Introduction
The main storage container in NeoAccess is the database, modeled by the class CNeoDatabase. A database is a file containing a set of objects. These objects are partitioned into classes which correspond to the subclasses of CNeoPersist defined in the program. (Application-specific classes inherit their persistence properties from CNeoPersist.) The program supplies NeoAccess with information about the inheritance relationships between classes. This enables the program to limit a search to objects of one specific class or allow it to range over a class and all subclasses.
Each class has a defined set of attributes (corresponding to its persistent data members) which are present in each object of that class in the database. Access to these attributes is through the virtual functions setValue() and getValue(), which the NeoAccess manual correctly describes as "the mother of all accessor functions." They provide access to every persistent attribute of an object, and perform type coercion as needed (if, for example, the requested attribute is stored as a Pascal string and requested as a C string).
Indices
Each class has a set of indices associated with it. An index is a list of every object of that class, sorted in order of a particular attribute. Indices make it possible to find objects efficiently and to iterate through a set of objects in order.
Every class has at least one index, known as the primary index. The default primary index sorts objects by object ID (a unique 32-bit integer assigned when the object is added to the database). All other indices are known as secondary indices. These are mappings from attribute values to object IDs. When searching a secondary index, NeoAccess uses the primary index to actually locate objects once their IDs are known.
When the program attempts to locate an object with a particular attribute value, the NeoAccess query optimizer looks for an index on that attribute. If an index is found, a binary search algorithm is used to locate the object. If the index does not exist, then a linear search algorithm must be used. When the program requests an iterator in order of a particular attribute, NeoAccess looks for an index on that attribute, then iterates through the index.
Secondary indices can be added and removed at runtime. This is useful if access patterns tend to change over time or are based on user preferences. Indices that are not frequently used can be removed to save space, while new indices can be created to index those attributes that are most frequently searched.
Part Managers
NeoAccess also provides a construct known as a part manager. Part managers can be thought of as persistent lists, implementing a one-way, one-to-many relationship. It is also useful to think of part managers as secondary indices that index only a subset of the objects in a class (namely those objects that have been explicitly added to the list).
For example, in a file system database, a directory object might contain a part manager which lists the directory contents in order of name. In fact, it might have several part managers sorting the contents by name, file size, and modification date. Keeping all of those part managers synchronized in the presence of system crashes and other anomalies could be problematic. One solution to the problem is to build the part managers dynamically when the directory is accessed. This can be easily done using queries.
Queries
A query is a persistent object which contains multiple part managers sorted in different orders. Objects are placed in the part managers by executing a query. In our example, we could assign each item a parent directory ID, and make a query to select those items with the desired parent ID. Since queries are persistent, both the selection criterion and the list of objects can be saved in the database. It is also simple to execute a query, use the results, and then discard the resulting lists.
Selection
NeoAccess defines a specialized mechanism for expressing selection criteria. There are a number of "selection key" classes which inherit from a base class of CNeoSelect. Most of them fall into the category of type-specific keys. CNeoLongSelect is used for long integer attributes, CNeoStringSelect for string attributes, and so on. These keys can be configured to search for an exact match or use another criterion such as "less than x" or "greater than x". There are also "complex keys," which combine multiple criteria into a single key. These include Boolean AND and OR as well as value ranges. Selection keys are used in all of the calls which retrieve data:
- findObject locates an object in the database matching the key.
- getIterator returns an iterator object, which the caller can use to walk through a set of objects in the database or in a part manager.
- CNeoQuery objects construct one or more sorted lists of objects matching the key.
Blobs
Some entities, such as image and movie data, cannot be easily represented with objects. NeoAccess provides a type of object, known as a blob, that simply stores a chunk of data. This is useful for image data, movies, and long text strings. Blobs can be indexed like any other field by using the CNeoBlobIndex class included with NeoAccess.
Dynamic Objects
Another optional component of NeoAccess is the DynaObject facility. When it is enabled through a compile-time flag, applications can add and remove attributes from persistent object classes at runtime and even create new classes on the fly. NeoAccess automatically maintains a prototype object for each persistent class and creates new instances from the prototype.
What's New
NeoLogic is constantly working to improve NeoAccess. Version 5.1 was just released when this article was written. It includes several new features and performance improvements.
Iterators can now be configured to keep track of the total number of entries and current position in the collection. This makes it much easier to implement scrolling lists efficiently.
The distributed object facility allows location-independent access to objects. This makes it possible to treat multiple databases as if they were one. For example, it is now possible to create an index in one database (the "host") to index objects stored in other databases (the "targets"). NeoAccess automatically opens and closes the target databases as needed. Part managers, queries, and swizzlers also implement object references that can refer to objects in other databases.
A general-purpose test harness allows developers to test their database code by subclassing a generic test class. Tests can be grouped into suites and scheduled to run sequentially or in random order, once or multiple times. The test harness has a command-line parser that passes parameters to the tests, allowing different options to be selected at runtime.
Performance
I ran the NeoBench program on a PowerMac 7200/120 with VM off and a 256K system disk cache. In a 1500K partition, NeoBench had about 1200K free and used half (600K) as an object cache. I ran two sets of tests: one with 2,000 objects and one with 10,000 objects. This yielded the following performance results (in operations per second):
| 2,000 objects | 10,000 objects |
Insert | 1,030 | 1,140 |
Locate Randomly | 14,600 | 11,000 |
Locate Serially | 148,000 | 146,000 |
Change | 1,140 | 1,220 |
Delete | 7,170 | 8,800 |
Notes
- The insert, change, and delete phases include committing changes to the database file.
- Enabling debugging code decreases engine performance by about one-third for most operations.
- Version 5.1, just released, has performance optimizations beyond those used in this test.
As you can see, NeoAccess achieves very fast access times. Like many benchmarks, NeoBench does not necessarily reflect the performance of real applications. It does a good job of testing the speed of the primary index, iteration, and object I/O. However, the persistent object class in NeoBench does not have any secondary indices. Each additional index slows down the insert, change, and delete operations. In addition, locating objects in a secondary index is slower than using the primary index.
Documentation
NeoAccess does have a rather steep learning curve; new developers are presented with around 60,000 lines of code totaling 2MB. The documentation copes with this volume of information in three ways. First, the manual contains an introductory section explaining the concepts we have seen here. Second, most of the manual consists of the class reference. Third, the manual includes a tutorial section that explains the internal workings of the Laughs demo program, one section at a time. Since Laughs uses a large subset of NeoAccess, this section is useful as a quick reference with code examples.
Users of previous versions will notice significant improvements in the documentation. In particular, the introductory section now includes instructions and code snippets showing how to use the basic features of NeoAccess. The reference section includes more background information on a number of classes.
As more complex constructs are introduced (part managers, blobs, dynamic objects, etc.) you will have to override more NeoAccess functions in your classes. Although the documentation does a pretty good job of explaining what to override, it remains a tedious and error-prone process. NeoLogic is planning to simplify this procedure in a future release.
To use some of the engine's capabilities, developers must modify the NeoTypes.h include file. It contains a number of compile-time flags controlling a huge amount of functionality, as well as common type declarations. It would be nice if this file was split into a file containing the type declarations and a "developer-modifiable" file containing the compile flags which we could keep under source control for differnt projects.
Debugging
Debugging NeoAccess programs can be complicated. There are a lot of assertions built into the database engine to check all sorts of things: parameter values, usage, and database consistency. However, even with those checks, it is still easy to write code that fails without producing an error message. One particular problem is when objects don't sort correctly. Usually this means that an index is missing, or indexed attributes were modified without updating the index. It would be helpful for these cases to at least produce warning messages.
Since NeoAccess is based on btrees, debugging can involve a lot of navigating through complex data structures in the debugger. I would like to see the debugging guide in the manual expanded to include a basic overview of how to examine classes, indices and part managers.
Purchasing
NeoLogic offers several different NeoAccess packages to meet different needs. All licenses are priced on a per-developer basis. There are no runtime licensing fees or royalties as long as the target application does not have a programmatic interface to persistent data (that is, an API or plugins) and is not a development tool.
The current prices are:
Developers | Platforms | Transferable | Price |
1 | Single | No | $749 |
1 | All | No | $1,499 |
1 | All | Yes | $2,999 |
25 | All | Yes | $12,500 |
Upgrades
Upgrade prices are based on the price of the original toolkit. Minor upgrades (5.1) cost 1/6 of the original price ($125 to $500). Major upgrades (6.0) cost 1/3 of the original price ($250 to $1000). NeoLogic also offers a subscription plan at an annual cost of 2/3 of the toolkit price. Subscriptions include all upgrades for a year, plus one hour of technical support.
There is one significant flaw in the NeoAccess update policy. As developers report problems with the database engine, NeoLogic generates bug reports and bug fixes. Periodically, these are incorporated into an official release (currently 5.0.5) and the patches are posted to the NeoLogic web site. Developers can download and apply these patches, but there is no way to obtain a "clean" bug-fix release (except for members of the NeoAccess Partners Program). Developers who bought 5.0 have already applied around 58 patches. It would be much simpler to have all the patches for a particular release encapsulated into update programs that we could download and run, or to provide entire files or functions instead of patching instructions. This may not seem like a big deal, but it's not pleasant to go through 58 of these, hoping to apply them all correctly.
---------------------------------------
347 Crash Resolved 5.0 - 5.0.5 All All All All
TNeoSwizzler.cpp 5.0.6
Applications using TNeoIDSwizzler objects crash when assigning a nil
pointer to the swizzler.
Correcting this problem involves adding an additional check in the
TNeoIDSwizzler::operator=(pPersist *aObject) assignment operator. The (aObject
!= fObject || aObject->fID != fID) conditional at the top of the function
needs to be changed to (aObject != fObject || (aObject &&
aObject->fID != fID) || (!aObject && fID)).
---------------------------------------
Support
Technical support is provided by email, free for 30 days and $120/hour thereafter. Questions about NeoAccess (asking how to accomplish something, or why a particular code sequence fails) generally receive prompt and useful answers. Some of the more difficult debugging problems (corrupted databases, for example) can't be solved without careful examination of the code. These types of questions are likely to be met with general debugging advice. The free tech support specifically excludes application-specific support; it seems reasonable to expect clients to pay if they want NeoLogic to debug their programs.
There is an email discussion group devoted to NeoAccess, where it is possible to get help from other developers who have had similar experiences. NeoLogic personnel often post responses to technical questions, and we have generally found this to be an excellent source of technical help (though this is no substitute for NeoLogic tech support when you really need it). Instructions for subscribing to the list are included on the NeoAccess CD.
NeoLogic also offers quarterly and annual support options. These all include free upgrades; NeoAccess Partners can download bug-fix releases and beta releases.
Plan | Months | Upgrades | Support | Price |
Subscription | 12 | Yes | 1 hr. | $500 |
Quarterly Support | 3 | Yes | 20 hrs. | $1,500 |
Annual Support | 12 | Yes | 80 hrs. | $5,000 |
NeoAccess Partner | 12 | Yes | 80 hrs. | $7,500 |
Code Snippets: Application Level
Here are examples of how to use some of the basic functionality of NeoAccess from the application level. The low-level code needed to support this is presented in the next section.
Some things to note about this code:
- gNeoDatabase is the NeoAccess global variable for the current database. If you are using a framework, the NeoAccess document classes ensure that gNeoDatabase always points to the front document's database.
- The template class TNeoTracker is similar to the C++ auto_ptr class. When it goes out of scope (either by returning from the function or throwing an exception), its destructor will delete the iterator. Without TNeoTracker, we would have to have a try/catch block in each of the functions that uses an iterator.
- The template class TNeoSwizzler is analogous to TNeoTracker, but is intended for use with persistent objects. Since persistent objects are reference-counted, application code may never use the delete operator on them. Swizzlers add and remove references as needed to maintain reference counts, even in the presence of exceptions. (Note that new and findObject return an object with a reference already added, so we explicitly call unrefer in those cases.)
Listing 1. CreateDatabase
Creating a Database
void CreateDatabase(const NeoFileSpec& fileSpec)
{
// Make the database object
CNeoDatabaseAlone* aDatabase =
NeoNew CNeoDatabaseAlone(kFileCreator, kFileType);
gNeoDatabase = aDatabase;
aDatabase->SpecifyFSSpec(&fileSpec);
// Allow the NeoAccess memory manager to write out changes to free up memory
aDatabase->setPurgeAction(kNeoCommitBeforePurge);
// Create the database file on disk and open it for writing
aDatabase->create();
aDatabase->open(NeoReadWritePerm);
}
Listing 2 AddPerson
Adding to the database
void AddPerson(const char* name, NeoDouble salary)
{
// Create the object
CPerson* person = NeoNew CPerson;
// Fill in attributes of the person. For setValue, specify the tag
// for the attribute we want to set, the type of the value we are
// supplying, and a pointer to the actual value.
person->setValue(pPersonName, kNeoStringType, name);
person->setValue(pSalary, kNeoDoubleType, &salary);
gNeoDatabase->addObject(person);
person->unrefer();
}
Listing 3. FindPersonByName
Finding an object
CPerson* FindPersonByName(const char* name)
{
// Make a key to select the people named 'name'.
// pPersonName is an access tag indicating the name attribute.
// name is the value we want that attribute to have.
CNeoStringSelect key(pPersonName, name);
// Find the object in the database. If more than one match, only
one will be returned.
CPerson* person = (CPerson*)
gNeoDatabase->findObject(kPersonClassID, &key);
return person;
}
Listing 4. Using an Iterator
The swizzler will add a reference to whatever object is assigned to it. When a new object is assigned, the reference will be removed from the previous object.
void PrintPeople(CNeoIterator* iter)
{
TNeoSwizzler<CPerson> person;
for( person = (CPerson*) iter->currentObject();
person != nil;
person = (CPerson*) iter->nextObject()) {
PrintPerson(person);
}
}
void PrintPerson(CPerson* person)
{
if(person == nil) {
printf( "Person is nil.\n");
}
else {
// Request name and salary
char name[kMaxNameLen];
NeoDouble salary;
person->getValue(pPersonName, kNeoStringType, name);
person->getValue(pSalary, kNeoDoubleType, &salary);
printf("%s:\t$%f\n", name, salary);
}
}
Listing 5. Iterating, unordered
Iterating with a nil key will give us every object in the primary index (therefore they will be in order of object ID, not alphabetical by name).
void PrintPeopleUnordered()
{
TNeoTracker<CNeoIterator> iter;
iter = gNeoDatabase->getIterator(kPersonID, nil);
PrintPeople(iter);
}
Listing 6. Iterating, ordered
We make a selection key with the tag pPersonName. This tells NeoAccess to use the index sorted by name. This allows us to iterate in alphabetical order.
void PrintPeopleOrdered()
{
CNeoStringSelect key(pPersonName, "");
// Make the key match everything so we see all of the names
// (we use the key only to indicate which index to traverse).
key.setMatchAll(true);
TNeoTracker<CNeoIterator> iter;
iter = gNeoDatabase->getIterator(kPersonID, &key);
PrintPeople(iter);
}
Listing 7. Iterating over a subset
We make a key to select only those people with salaries of at least minSalary. If there is a salary index, NeoAccess will use it to quickly locate the set of objects we requested, and the results will be sorted by salary. Otherwise it will search all people (in the primary index) to find any that match our criteria, and the results will be sorted by object ID.
void PrintMoneyMakers(NeoDouble minSalary)
{
CNeoDoubleSelect key(pSalary, minSalary);
key.setOrder(kNeoHighOrEqual);
TNeoTracker<CNeoIterator> iter;
iter = gNeoDatabase->getIterator(kPersonID, &key);
PrintPeople(iter);
}
Implementing a Persistent Class
Now that we've seen how to use NeoAccess from your application, we can look at the lower-level code that implements the CPerson class. This is a minimal class; using the advanced features of NeoAccess requires overriding more functions. The documentation describes when you must override additional functions and how to call the inherited function. Also, there are plenty of examples of how to override CNeoPersist functions built into NeoAccess.
Listing 8. Declaration of CPerson
setValue and getValue are almost identical (as are readObject and writeObject), so I have only one.
// Declare a unique ID for this object class.
const NeoID kPersonID = 20;
// Declare access tags used by setValue and getValue
enum {
pPersonName = 'Name',
pSalary = 'Slry'
};
class CPerson : public CNeoPersistNative {
public:
// Perform any initialization specific to this class
static void InitPersonClass();
// Instance methods
virtual NeoID getClassID() const { return kPersonID; }
static CNeoPersist *New();
// I/O Methods
virtual long getFileLength(
const CNeoFormat *aFormat) const;
virtual void readObject( CNeoStream* aStream,
const NeoTag aTag);
virtual void writeObject( CNeoStream* aStream,
const NeoTag aTag);
// Accessor methods
virtual Boolean getValue( const NeoTag aTag,
const NeoTag aType,
void *aValue) const;
virtual Boolean setValue( const NeoTag aTag,
const NeoTag aType,
const void *aValue);
private:
// Declare the actual storage for the attributes
CNeoString fName;
NeoDouble fSalary;
// Pointer to the metaclass object representing this class
static CNeoMetaClass* NeoNear FMeta;
};
Listing 9. Implementation of CPerson
CNeoPersist* CPerson::New()
{
// Static function to create person objects. This is registered as
// part of the metaclass and used internally by NeoAccess.
return NeoNew CPerson;
}
long CPerson::getFileLength(const CNeoFormat *aFormat) const
{
// getFileLength : returns the amount of space occupied by this object in the database.
long len = NeoInherited::getFileLength(aFormat);
len += sizeof(fName) + sizeof(fSalary);
return len;
}
void CPerson::readObject( CNeoStream *aStream,
const NeoTag aTag)
{
// readObject is responsible for reading the object from the given stream.
// By examining the stream's format object, we can support older file versions.
NeoInherited::readObject(aStream, aTag);
aStream->readNativeString(fName, sizeof(fName));
fSalary = aStream->readDouble();
}
Boolean CPerson::getValue( const NeoTag aTag,
const NeoTag aType,
void *aValue) const
{
// aTag indicates what attribute is being requested; aType indicates the
// data type of aValue. We must convert the data to the requested type.
Boolean result = TRUE;
switch (aTag) {
case pNeoName:
if (aType == kNeoNativeStringType)
*(CNeoString*) aValue = fName;
else
ConvertType( &fName, kNeoNativeStringType,
aValue, aType);
break;
case pSalary:
if (aType == kNeoDoubleType)
*(NeoDouble*) aValue = fSalary;
else
ConvertType(&fSalary, kNeoDoubleType, aValue, aType);
break;
default:
result = NeoInherited::getValue(aTag, aType, aValue);
}
// Return true if getValue was successful
return result;
}
Listing 10. CPerson Metaclass Object
The metaclass object registers this class with NeoAccess.
CNeoMetaClass* NeoNear CPerson::FMeta =
NeoNew CNeoMetaClass(
kPersonID, // This class ID
kNeoPersistID, // Base class ID
"CPerson", // Class name
// Allocator function UPP
NeoNewGetOnePersist(CPerson::New));
void CPerson::InitPersonClass()
{
// This function should be called at program startup to perform
// additional initialization as needed. We use this opportunity to
// add the name and salary indices to our metaclass object.
// Since the indices are stock NeoAccess objects, this is all
// we need to do to add indexing on any attribute!
FMeta->addKey(kNeoNativeStringIndexID, pPersonName);
FMeta->addKey(kNeoDoubleIndexID, pSalary);
}
Conclusion
NeoAccess is a very powerful object database management system well-suited to a wide variety of data storage applications. After overcoming the steep learning curve, developers should find the product relatively easy to use. The documentation is helpful, but mastering the more complex features of NeoAccess requires experience with the database engine.
Pricing and upgrade policies are mostly reasonable. The lack of runtime licensing fees for most applications make this an attractive option for commercial software products. NeoAccess is used in the latest versions of Netscape Communicator, America Online 3.0 and NetObjects Fusion.
For more information, check out the NeoAccess Technical Overview. The overview, plus source code and executables for the demo programs, are available on the NeoLogic web site.
URLs
Michael Marchetti, mmarchetti@itainc.com, develops Macintosh software at ITA, Inc., a provider of custom software solutions located in Rochester, New York.