QuickTime Media Layer
Volume Number: 13 (1997)
Issue Number: 7
Column Tag: Multimedia
The QuickTime Media Layer
by Tim Monroe, Apple Computer, Inc.
An overview of Apple's flagship multimedia software
Introduction
Imagine that you're standing in a museum, surrounded by famous works of art. On the wall in front of you hangs a painting by Jackson Pollack. You reach out and touch the painting, and instantly a screen slides down in front of the painting; on the screen appears a video showing Pollack in the process of splashing paint and stabbing wildly at the canvas. Soon it becomes clear that he's working on the very painting that hangs in front of you.
You move to the next room. As you enter, a voice announces that this room contains the last few remaining vases from the Ming dynasty. You go to the center of the room and walk around the blue and white vase that sits on a pedestal. Then you pick up the vase to examine it further: you want to see the bottom and to look inside the vase. Suddenly, off to the right and a bit behind you, you hear a crash as a vase collides with the marble floor. As you turn and look, a crowd of schoolchildren hurriedly moves away from a pile of shards. Unfazed, you utter the phrase "reassemble the vase, please." The shards swirl up into a spinning cloud that eventually settles gracefully onto its pedestal as a complete, unbroken vase.
Then you move into a third room. Large mobiles hang from the ceiling. You click a switch on the wall and a fan begins to spin, gently blowing the mobiles and setting them in motion. The fan hums and the mobiles creak and clang as they move and collide with one another. After a few minutes, the fan stops abruptly. A wall, previously blank, now displays a live video picture of a security guard announcing that the museum will be closing soon. You leave the museum.
This experience - this multimedia experience - seemed so real that for a few moments you perhaps forgot that you were sitting in front of a computer. You almost felt sorry for the schoolchildren when they accidentally smashed the vase. And you almost even thought you could feel that hard marble floor under your feet as you moved around in your virtual museum. What made this experience real, what made it seem as if you were immersed in another world, was the set of Apple multimedia technologies collectively known as the QuickTime Media Layer (QTML). In this article, I'll provide an overview of the QTML. I'll describe the main components of the QTML and highlight some of the key interrelations among those components.
What Is QTML?
In a nutshell, the QuickTime Media Layer is a collection of technologies developed by Apple Computer that allow authoring and playback of multimedia content. There are three main technologies currently included in the QTML: QuickTime, QuickDraw 3D, and QuickTime VR. (As we'll see later, there are several other technologies that are loosely associated with the QTML.) What distinguishes each of these technologies, and makes them suitable for inclusion in the QTML, are these features:
- The technology is media-rich. The QTML provides, first and foremost, an avenue for the delivery of digital media. QTML technologies pertain to the eyes and ears. Other sensory modes, if digitized, would be prime candidates for inclusion in the QTML. For instance, some joysticks can provide tactile feedback, and a general interface to such devices would make a natural QTML component. We'll probably have to wait a while, however, for olfactory (QuickSmell?) and gustatory components (QuickTaste?) of the QTML.
- The technology is interactive. It's great to sit and watch things happen, but it's essential for most everyday uses of media delivery that the user be able to interact with the environment. Accordingly, the core QTML technologies provide some means for users to control the environment. QuickDraw 3D provides a picking architecture that allows the user to move or select objects in a scene. QuickTime VR is based almost entirely on allowing the user to choose a navigation path through a virtual world or to manipulate an object in a virtual world. Even QuickTime -- long the prime example of "just sit and watch" -- is moving toward increased interactivity.
- The technology is cross-platform. The QTML is based on a strategy of "author once, deliver many." Ideally, applications developers should be able to build playback applications for the major personal computer operating systems, including MacOS, Windows 95, and Windows NT. (QuickTime itself currently also runs on OS/2 and several flavors of UNIX.) Just as importantly, developers should be able to build authoring systems for any of these platforms. Both of these needs can be served by providing a set of cross-platform application programming interfaces (APIs) for QTML components.
- The technology supports a standard file format. This feature is just as important as the cross-platform APIs just mentioned and provides a key part of the QTML cross-platform delivery strategy. A standard, fully documented file format is also important to permit data exchange between applications running on the same operating system. For instance, the 3D file format supported by QuickDraw 3D provides a means of sharing 3D data between applications and across platforms. QuickTime and QuickTime VR also support a publicly documented file format.
- The technology is scaleable. The quality of the user's experience when interacting with multimedia content should depend primarily on the capabilities of the user's hardware, not on the limitations of the multimedia content or playback software. All the main QTML technologies can take full advantage of the available memory or other hardware on the user's computer. For example, QuickDraw 3D automatically uses any available supported 3D acceleration card to speed rendering and other operations. Similarly, a QuickTime VR panorama can be viewed at differing resolutions, depending on the amount of available RAM. QuickTime can take advantage of multiple processors on the MacOS to speed its calculations. This scaleability is another facet of the "author once, deliver many" philosophy: the same data file can provide vastly different experiences, depending on the available hardware.
In short, the QuickTime Media Layer provides a platform-independent standard for the creation, distribution, and playback of digital media, including video, sound, rendered objects, immersive panoramas, and manipulable objects.
The Stars
Now let's take a brief look at the three principal parts of the QuickTime Media Layer: QuickTime, QuickDraw 3D, and QuickTime VR.
QuickTime
QuickTime is the core of the QTML. It provides a cross-platform multimedia architecture that allows integration of a wide variety of media data types, including graphics, sound, video, text, music, 3D objects, and sprites - with the ability to synchronize all these media types to a common time base. In a word (or two), QuickTime manages time-based data. A collection of time-based data is called a movie. QuickTime provides tools to display movies and to let the user interact with movies in appropriate ways (starting, stopping, pausing, and so forth). It also provides the capability to interact with movie data in other ways as well (compressing, expanding, cutting, pasting, copying, and so forth).
In the years since it was introduced, QuickTime has gradually added support for a number of media data types. For instance, QuickTime version 2.0 added support for the QuickTime Music Architecture (more on that later) and version 2.5 added support for MPEG-encoded video. Current versions of QuickTime support 3D data and sprite tracks. It's fairly easy to add support for a new data type because QuickTime is built on a component architecture. (A component is a piece of code managed by the Component Manager that provides a defined set of services to one or more clients.) Each QuickTime component provides an interface to a set of features associated with the manipulation of some sort of data (which might or might not be time based).
The latest version of QuickTime for both MacOS and Windows machines is QuickTime 3.0, which provides several important advancements over previous versions, including expanded file format support, a media abstraction layer, and accelerated visual effects:
- QuickTime 3.0 supports playback, editing, and integration of QuickTime data, MPEG files, AVI, OMF, DVCAM, and OpenDML files, thereby providing one of the highest levels of integration with all major video file formats. QuickTime 3.0 also supports a wide variety of sound file formats, including Wave, AIFF, AU, MPEG Layer 2, and MIDI formats.
- The new media abstraction layer provides QuickTime with a means of accessing hardware accelerators or other multimedia enhancements in a way that is transparent to the software using the QuickTime API. This ensures that existing applications will benefit from these enhancements without any changes.
- QuickTime 3.0 includes enhancements to the QuickTime software architecture that standardize the way in which applications work with visual effects and transitions. For example, QuickTime 3.0 includes a large set of built-in software-based effects, such as cross-fades, chroma keying, SMPTE wipes, and color adjustments.
From the programmer's point of view, it's relatively easy to add support for QuickTime movies to an application. With a few lines of code, you can open a movie file and provide a standard user interface for the user to control the movie playback. The movie file itself can contain all the data needed to synchronize the various data types displayed in the movie. QuickTime also provides a large set of functions for creating and editing movie data.
QuickDraw 3D
QuickDraw 3D is a cross-platform graphics library that you can use to create, configure, and render 3D models. You can also use QuickDraw 3D to manage user interaction with a rendered 3D scene, such as navigating within the scene (that is, changing the camera angles) and selecting objects in the scene. QuickDraw 3D supports a wide range of basic geometric objects and transformations of objects, as well as attributes for those objects. QuickDraw 3D also supplies several lighting models, shaders, and renderers.
The QuickDraw 3D graphics library supports a C-based API. Most of the API provides a standard object-oriented approach to 3D graphics, wherein you create objects that can inherit properties and behaviors from other objects. For applications that require only the display of 3D objects and limited user interaction with those objects, QuickDraw 3D also supplies a high-level API for the 3D Viewer. In a sense, using the 3D Viewer is like using the standard movie controller to display QuickTime movies: it's very easy to provide a standard interface to the underlying data.
Like QuickTime, QuickDraw 3D is extensible, though not in precisely the same manner. QuickDraw 3D does not support a component-based architecture; instead, it allows developers to extend its capabilities by defining custom objects. Moreover, QuickDraw 3D supports a hardware abstraction layer -- called the QuickDraw 3D Rendering and Acceleration Virtual Engine (RAVE) --that allows for plug-and-play hardware acceleration.
QuickDraw 3D also defines a platform-independent file format, called the 3D Metafile Format (3DMF), for storing and interchanging 3D data. This format is intended to provide a standard format according to which applications can read and write 3D data (even applications that do not use QuickDraw 3D to render images). QuickDraw 3D supplies functions that you can use to read and write data in 3DMF files.
QuickTime VR
The new kid on the QTML block is QuickTime VR, an imaging technology that allows users to interactively explore and examine photorealistic, three-dimensional virtual worlds and objects. QuickTime VR is really two separate technologies in one package. One part of QuickTime VR supports panoramic nodes (or "panoramas"), where the viewer can turn around, as if sitting on a rotating stool, to view different parts of the space around him or her. The other part of QuickTime VR supports object nodes (or "objects"), where the viewer can turn an object horizontally and vertically, as if picking it up and examining it. Any number of panoramas and objects can be linked together into a scene. Clicking predefined areas in a particular node ("hot spots") can move the viewer to another node in the scene or initiate other actions.
QuickTime VR is like QuickDraw 3D in that both technologies are geared toward spatial data. Both of them try, in different ways, to make it seem as if you're in a spatial location, populated by 3D objects. QuickDraw 3D is a traditional 3D graphics library, where each and every object in a scene must be described geometrically and rendered in real time as the viewer's location changes in 3D space. QuickTime VR, on the other hand, works with data that is typically captured photographically and hence can provide substantial detail with a very small data size.
QuickTime VR playback has been available on both MacOS and Windows machines for several years. Early this year, Apple introduced a C language API for controlling VR movie playback. See "Programming With QuickTime(tm) VR" in this issue for a detailed description of that API.
The Supporting Cast
The QuickTime Media Layer is associated with several other important technologies. Some of these are really part of the QTML but deserve special mention, and some are not strictly part of the QTML but provide some nifty capabilities when used with it. Some of these QTML-wannabees are not yet available cross platform, however.
Sound Manager
The Sound Manager is the part of the QuickTime Media Layer that manages sounds. For instance, when you play a QuickTime movie, it's the Sound Manager that is ultimately responsible for turning the audio data included in the movie into sounds. This process might involve a good bit of work. For example, the audio data might have to be uncompressed; the uncompressed data might then require that its playback rate be changed; the rate-shifted data might then have to have its volume adjusted; finally, the audio data might have to be mixed with other sounds already playing.
The Sound Manager is available on computers running both the MacOS and Windows operating systems. Like QuickTime, its operations are handled by a variety of components, so it is relatively easy to add capabilities (for instance, to handle different compression and expansion algorithms) to the Sound Manager by writing your own custom components.
QuickTime Music Architecture
The QuickTime Music Architecture (QTMA) was introduced as part of QuickTime 2.0. It provides an interface to MIDI, a standard music and device-control architecture, but does not require that any actual MIDI devices be attached to the computer. The QTMA can play individual notes and sequences of notes (generated on the fly or prerecorded) on any available MIDI device, or on a software-based MIDI synthesizer if no external devices are available. You can also use the QTMA to read input from external MIDI devices.
One advantage of the QTMA is that the amount of data required to generate a tune is significantly smaller than that amount of data contained in a digitized recording of that tune. If music figures importantly in your multimedia content, you should consider the QTMA as the delivery vehicle.
SoundSprocket
SoundSprocket is the part of Apple's Game Sprockets that provides 3D filtering for sounds. (See [Vineyard 1997] for more detail about the Apple Game Sprockets package.) You can use SoundSprocket to make a sound appear to emanate from a specific point in space, and you can change the location of the sound dynamically. These capabilities are especially useful for the QTML components that support a spatial medium, namely QuickDraw 3D and QuickTime VR. For instance, you can assign specific sounds to locations in a panorama; as the user pans or tilts to change the view angle, the location of the sounds appears to change as well.
SoundSprocket is not officially part of the QTML, and currently it's available only on PowerPC-based MacOS computers. Happily, however, it's fairly easy to duplicate some of the SoundSprocket functionality using QuickTime or the Sound Manager. You can simulate 3D filtering of sounds for QuickTime movies by adjusting the balance and volume of a movie's sound track as the user changes spatial positions or orientations in a 3D scene or a VR panorama. Also, you can simulate 3D filtering of sounds played using the Sound Manager by issuing the volumeCmd sound command, which controls the volume and balance of the left and right speakers independently.
PlainTalk
Apple provides two speech technologies bundled together under the general name PlainTalk: speech synthesis and speech recognition. Speech synthesis is the process of converting written tokens (text) to spoken tokens (speech). This can be useful to provide a narration of a walk-through or to vocalize a QuickTime text track. Speech recognition is process of converting spoken words into recognized utterances. This is useful to give the user another input method. For instance, instead of having the user pan and tilt in a VR panorama using the mouse or keyboard, you can support speech commands to achieve the same effect. (See [Pallakoff and Reeves 1996] and [Monroe 1996] for several good articles describing Apple's speech recognition capabilities.)
Once again, these technologies are not officially part of the QTML, but they can dramatically enhance the user experience when combined with the multimedia technologies provided by the QTML.
QuickTime Conferencing
QuickTime Conferencing (QTC) is a set of software components that support sharing time-based media across local- and wide-area networks. In other words, QTC provides real-time multimedia communications. You could use QTC, for example, to support videoconferencing or a "virtual whiteboard." QTC provides a number of components for managing the network interface and other operations, and also uses standard QuickTime components when possible.
Integration
The real fun with the QuickTime Media Layer is making it all work together. Each component of the QTML has a programming interface, so you can combine components simply by using the APIs together. One of my favorite early examples of QTML integration is Robert Dierkes' TextureEyes application, which can texture map a QuickTime movie (or even a live video feed!) onto a QuickDraw 3D object. You could use the same technique to play a movie on a rendered TV screen in a 3D scene. And then you could take that rendered object, with its QuickTime movie texture, and embed it in a QuickTime VR panorama. And then, with a few simple Movie Toolbox functions, you could adjust the sound balance and make the movie's sound track get louder or quieter as the user pans toward or away from the TV set.
Similarly, it's very easy to integrate SoundSprocket and QuickDraw 3D to attach sounds to specific locations in a rendered 3D scene. (Indeed, the SoundSprocket API uses many QuickDraw 3D data structures to describe the location and orientation of the virtual listener and the sound sources.) It's also reasonably easy to attach sounds to specific locations in a QuickTime VR panorama and to link the orientation of the SoundSprocket listener to the current orientation of the viewer in the panorama.
Conclusions
The QuickTime Media Layer is a set of cross-platform Apple technologies that support the authoring, delivery, and playback of multimedia content. Used together, these technologies provide a means to integrate spatial and temporal data into a rich, unified, interactive user experience. The data underlying this experience can be generated dynamically, or read from files stored locally, on CD-ROM, or on a remote server accessed across a network.
It's important to keep in mind that the QTML is a unifying strategy, not a finished product. At the current time, true media integration must be done at the API level or using an authoring environment that supports the various media types. Currently, there is no single, unified file format for the many data types supported by the stars and supporting cast members of the QTML. Part of what this means is that there is no easy way (again, short of programming or using an authoring environment) to animate 3D objects in a QuickTime VR scene or to attach sounds to locations in a 3D scene.
Nonetheless, it's only a matter of time before software developers begin to use the existing APIs to create authoring and playback tools to provide a more seamless integration among all members of the QTML. At that point, the QTML will move outside the ranks of programmers and become the unified media authoring and playback layer for the rest of us.
Bibliography and References
Monroe, Tim. "Adding Speech Recognition to an Application Framework".
develop, The Apple Technical Journal, issue 27 (September 1996), pp. 22-33. Apple Computer's Developer Press.
Pallakoff, Matt, and Arlo Reeves. "The Speech Recognition Manager Revealed".
develop, The Apple Technical Journal, issue 27 (September 1996), pp. 6-21. Apple Computer's Developer Press.
Vineyard, Jeremy. "Sprockets are Forever".
MacTech Magazine, 13:2 (February 1997), pp. 12-15.
Tim Monroe, monroe@apple.com, is a software engineer on Apple's QuickTime VR team, responsible for developing sample code for the new QuickTime VR C language API. In his previous life at Apple, he worked on the Inside Macintosh team, where he wrote developer documentation for QuickDraw 3D, QuickTime VR, the sound and speech technologies, and a host of other APIs.