Res Compression
Volume Number: | | 9
|
Issue Number: | | 1
|
Column Tag: | | Resources
|
Related Info: Resource Manager
Resource Compression
What it is, how it works, and how to use it in your own software
By Justin Gray, Alysis Software Corporation
Note: Source code files accompanying article are located on MacTech CD-ROM or source code disks.
With the release of System 7, Apple quietly introduced a powerful means of applying transparent compression to applications and other resource files that has yet to be matched by third-party schemes. While System 7 features like AppleEvents and Balloon help were highly-touted and well-documented, Apples resource compression scheme remains a tantalizing mystery to most Macintosh developers. In this article, well explore exactly what resource compression is, how it works, and how to apply it to your own software.
What Resource Compression is
To deliver on the promise of System 7s remarkable new features, Apple programmers had to write a significant volume of new code for both the Finder and System files. During the course of development, it became clear that the system and Finder would not be able to fit on a single high-density diskette without the application of some sort of compression. The compression applications available at the time could not be applied since they would only create archives, not compressed files that could be used in compressed form. What was needed was a compression/decompression scheme that would allow compressed files to operate while still compressed on disk and without degrading system performance. The solution was the resource compression system employed in the System File, the Finder, ResEdit, and TeachText.
Like any other coding - decoding system, resource compression consists of two basic components: a tool for compressing data, and decompression code which decodes the compressed data. While most third-party compression software comes in application form, resource compression exists in the form of an enhancement to the Resource Manager supported by decompression resources of type dcmp. After compiling the final version of his/her software, a developer uses a resource compressing application to shrink the individual resources in the application and paste in his/her dcmp resource. When the end user launches the application, the Resource Manager handles the parsing of compressed resources and calls the appropriate decoders to transparently expand the compressed resources that it encounters.
The significant advantages to this approach are speed and compatibility. Since resources are decompressed into memory rather than to disk, their compressed contents can be accessed in a fraction of the time required by a system that decompresses to disk. Because the resources are compressed and decompressed individually, they can be accessed much more rapidly than if the entire file were expanded into memory. An application which takes up half the space on disk will require only half as much disk access to load into memory. If the decoding software can expand the compressed data more rapidly than the SCSI bus can transfer the same number of bytes, the application will actually launch faster in compressed form.
Since the decompression is done automatically by the Resource Manager and the appropriate dcmp resource, compressed data can be accessed transparently by software that is unaware of compressed resources: the same trap calls that access ordinary resources also access compressed resources with no difference in passed parameters. Applications written before resource compression was made available will launch and run after being made smaller through resource compression. Adobe Photoshop and Aldus PageMaker 4 will each fit on and execute from 800K diskettes after being compressed by a commercially-available resource compression application.
Unfortunately resource compression applies only to resource forks. Apple does not currently provide an equally transparent scheme for manipulating compressed data forks, although several third-party publishers supply software which does. Resource compression is also currently a one-way compression scheme - the Resource Manager decompresses resources but does not add them to resource files in compressed form. And the compression tools which Apple uses internally are not currently available to developers. (This article will describe how to create one.) Despite these limitations, resource compression promises to deliver the most transparent and best performing means of making applications and system enhancements smaller for the forseeable future of 680x0-based Macintoshes.
How Resource Compression Works
Poking about in the System or Finder with ResEdit reveals very few visible cues that the files are compressed. Resources appear as they normally would and can be opened and edited or cut and pasted with no delays or unexplained changes in size. Because the resource manager handles compressed resources, they appear to any programs which use the resource manager as ordinary resources. The only discrepancy introduced by resource compression is the difference between the sum of the values returned by SizeResource and the total size of the resource fork. (These two numbers do not necessarily have a close correspondence even in files without compressed resources.)
The transparent decompression of these resources happens through a variation of the Resource Managers regular loading process. Normally, when a GetResource, Get1Resource or similar trap is executed, the Resource Manager (herein called RM) allocates a handle to a block of memory sufficiently large to hold the entire resource. It then calls the File Manager and reads the appropriate range of data from the resource fork of the file into the new handle. When the RM encounters a resource whose compressed attribute is set, it calls an additional set of low-level routines.
Once the resource is loaded, the RM double-checks the status of the resource and exits if its header does not contain the appropriate tag bytes. If the resource passes this check, then the RM determines the kind of memory allocation required by the decompressor by checking the resource header flag bits. The RM then extracts a resource ID from the resource header, which it uses to load a system resource which employs the appropriate algorithm or algorithms to decompress the data contained in the resource. Herein is the exciting modularity of the resource compression scheme: each decompression algorithm used by the RM is held in a self-contained code resource of type dcmp. Adding new compression algorithms to the Macintosh is simply a matter of plugging new dcmp resources into the system file or even into the file which contains compressed resources. The Macintosh System does the rest!
Once the RM is armed with a decoding routine, it allocates a buffer for use by the decoder and passes pointers to the buffer and the source and destination buffers of the decompression operation. Interestingly, the destination pointer begins before the source pointer and the difference between the two is less than the size of the decompressed resource: the decompressed data overruns the already-used compressed data as it is decoded.
The brilliance of this scheme is that it requires only a single buffer for both the compressed and decompressed data. In an ideal world the buffer would be precisely the size of the decompressed data - like Ouroboros, the mythical snake who girded the world in perpetual pursuit of its tail, the decompressed data would replace the already-used compressed data. It would grow ever closer to the unused compressed data but never overwrite it. In practice, however, compression algorithms do not provide uniform compression and may for brief instances produce output code that consumes more space than the input code that it replaces. The RM solution to this problem is to provide an expansion buffer between the decompressed and compressed data. The size of the buffer, recorded in the resource header, is equal to the maximum amount that the encoded data might grow toward the end of the decompression process. Once the data is decompressed, the handle is shrunk by the expansion buffer size, the memory buffer used by the decoder is released, and the handle is returned to the caller. Two types of dcmps provide support for small and large expansion buffers.
Although this description is somewhat simplified, it serves to illustrate the tremendous power of Apples approach to handling compression. By compressing and decompressing resources at the RM level, Apple has succeeded in delivering a system that opens compressed resources more rapidly than uncompressed resources, provides rapid random access to compressed data, and provides complete transparency and backward compatibility to code which uses the RM to access data. Tapping into the power of the scheme involves nothing more difficult than running a resource compressor program and copying a dcmp resource with ResEdit.
How to Use Resource Compression in Your Own Software
While using existing resource compression tools to apply compression to applications, extensions and other resource files is trivial, building a resource compressor, designing compression algorithms, creating optimized assembly-language implementations of the codecs, and building the appropriate dcmps code resources to allow the System to decode the compressed resources requires somewhat more effort. For those of you who are interested in tinkering with your own resource-based compression systems, here is an outline of the basic steps that youll have to take and some source code to get you started.
Selecting an algorithm
The first step in tapping into resource compression is to select and create a codec. The primary criteria for selecting your algorithm should be speed, efficient use of memory, and compression efficiency under limited-context conditions. Since a single dcmps added to the system file will allow for decompression of resources in all files, size of the decompression object code is not a major consideration.
Most compression algorithms currently in commercial use employ a combination of two techniques: string-matching and frequency-based encoding of characters to variable-length bit fields. The string matching algorithm is applied first to reduce instances like the string Resource Manager in this article to single bytes. The second algorithm compares the frequency of the single bytes or short strings and then substitutes shorter bit-field elements for the more frequently occurring characters. Both types of algorithms are well-documented in computer journals and textbooks. Popular string-based compression algorithms include Limpel-Ziv and its derivatives. Popular frequency-based compression algorithms include Huffman coding, arithmetic coding and adaptive variations of both.
While non-adaptive frequency-based coding is fairly fast in both compressing and decompressing, string-based compression algorithms tend to operate more slowly, especially in compression. To reap the rewards of high-speed decompression, you should choose an algorithm that is inherently fast or invest some time in optimizing the performance of an already-implemented algorithm.
Since resource decompression is executed by the RM, your decoder should be frugal in its use of memory. The RM allocates buffers for decompressors in the system heap. While management of the system heap has improved considerably, the Memory Manager may not be able to free sufficient space for the more memory-hungry adaptive codecs.
Compression efficiency in a limited context environment is also of primary importance in selecting a compression algorithm. File-based compression schemes have an advantage over resource compression in that they operate on an entire file at a time, enabling the compression algorithm to acquire an extensive context or set of libraries with which to compress data efficiently. Because resources can be accessed independently, resource-based compression operates only within the context of a single resource. Many resources are quite small and will not compress well with Limpel-Ziv-derivative algorithms.
Creating a resource-compressing application
Once youve selected and implemented a compression algorithm that is fast, lean, and operates with limited context, you will want to incorporate the encoding portion of your codec in a resource-compressing application. Your application will need to open resource files, select resources for compression (some resources should not be compressed), compress the data contained in the selected resources, calculate and add header information to the resources, add them back to the resource fork, set the appropriate attributes for the resources and then set the appropriate attribute bits for the resource file. Except for the exact header format, described in the following structure, most of the documentation for this work can be found in Inside Macintosh.
Compressed Resource Header Format
[1]
typedef struct {
long compressedResourceTag;
long typeFlags;
long uncompressedSize;
char workingBufferFractionalSize;
char expansionBufferSize;
short dcmpID;
} resourceHeader;
Here the compressedResourceTag should be 0xA89F6572. The typeFlags should be set to 0x120801. The workingBufferFractionalSize should be a fixed point fraction of the size of the uncompressed resource. The expansion buffer size should be a number from 0 to 255 which represents the greatest number of bytes that the compressed data might grow during the end of the compression process.
Once youve successfully compressed resources with your new application, youll be able to examine them in ResEdit. If your application has compressed and manipulated the resources properly, you will be able to see them in the resource map listing and get information about them. Trying to cut, copy or edit the resources will give you a bent resource error, -186. This simply means that the appropriate 'dcmp' resource has not been made available to the system.
Creating a decompression resource
There are several types of dcmp resource used by the system software. The source code provided here will enable you to employ your decoder within the small-expansion-buffer 'dcmp' format like the one used by System 7 for decompressing code resources. In combination with a good algorithm, you should be able to get significantly better compression ratios on code resources and others.
Creating the decompression resource consists of creating a code resource of type dcmp which contains the header glue code in the following source code listing as well as the decoding portion of your codec. Since your dcmp will be executing within a RM call, you should try to incorporate all of the code required into a single resource, rather than having a separate decoding resource loaded by your 'dcmp'.
[2]
dcmp Resource glue code listing
typedef struct {
char unused[7];
long dataSize;
long workingBuffer;
long destinationBuffer;
long sourceBuffer;
} dcmpParameters;
main (){
asm {
debug
link a6,#0 ; Grab a stack frame
movem.ld0-d7/a0-a6,-(a7) ; Save all variables
; Get the input buffer
move.l dcmpParameters.sourceBuffer(a6),a0
; Get the output buffer
move.l dcmpParameters.destinationBuffer(a6),a1
; Get the working buffer
move.l dcmpParameters.workingBuffer(a6),a2
move.l dcmpParameters.dataSize(a6),d0 ; How much data?
; Insert your decompression code here.
movem.l(a7)+,d0-d7/a0-a6 ; Restore the registers
unlk a6
move.l (a7)+,8(a7); Move return address
addq.l #8,a7 ; Discard the stack frame
} /* end asm */
/* Return to Resource Mgr */
}
It is important to use the Custom Header selection from Think C for creating a 'dcmp' resource with this glue code. The register assignments used in this example were arbitrarily chosen.
Once your 'dcmp' is compiled, you can either add it to your compressed resource file or the active system file. Once the 'dcmp' is added to an open resource map, it will be executed by the RM whenever one of the corresponding compressed resources is accessed. Now double-clicking on one of the resources compressed by your application will yield the following message, That resource is compressed. If you make changes, it will be saved uncompressed. Do you want to edit it anyway? Pressing the Yes button will open the resource as if it were not compressed. All other accesses to the resource will open it transparently and return information about the resource as if it were not compressed. Congratulations -- You have successfully implemented a resource compression scheme.
For more information
Alysis Software Corporation
1231 31st Avenue
San Francisco, CA 94122
Voice: 415/566-2263 Fax: 415/566-9692
AppleLink: ALYSIS
America Online: ALYSIS
CompuServe: 76500,3011