Entourage 2004 Spotlight Support
Volume Number: 22 (2006)
Issue Number: 5
Column Tag: Programming
MICROSOFT | MAC IN THE ENTERPRISE
Entourage 2004 Spotlight Support
by Brian Johnson and Andy Ruff
An IT Perspective: How Microsoft Entourage 2004 now takes advantage of Spotlight
Introduction
In update 11.2.3, Microsoft added support for Spotlight and Sync Services to Microsoft Entourage 2004 running on Mac OS X 10.4. These two features allow users to search Entourage e-mail stores and to synchronize Entourage data with any software or hardware that takes advantage of sync services in the OS.
It's important for system administrators who need to plan deployment of this technology on Macs, and may need to consider configurations with many users, and with limited disk space, to understand how this all works. In this article, we'll focus on the Spotlight support added to Entourage. We'll tell you about how Spotlight support works in this update. Specifically, we'll also address Spotlight support considerations for multi-identity installations of Entourage.
An Overview of Spotlight in Entourage
From the user perspective, Spotlight search in Entourage provides a mechanism that allows for the full text search of items in the Entourage database. Spotlight uses file based metadata and a constantly updating index to return results to queries passed through the Spotlight search interface in the operating system. Results return quickly because the index is updated based on messages coming from the file system. Once the initial indexing is complete on a set of data, additional data is indexed automatically as files update on the system.
One of the difficulties in making Spotlight work with Entourage had to do with how Entourage stores its data. All Entourage data is stored within a single database file per user identity. Entourage was designed to be multi-user at the application level. This was to allow multiple family members to have their own identities in the application in the home environment. Entourage stores its data in a single database for each user identity created in Entourage. When a user first sets up Entourage the Identity they get is named "Main Identity".
In order to support Spotlight searching, we had to develop a mechanism for providing Entourage's database content to Spotlight's file-oriented indexing process. We settled upon a solution that "mirrors" the essential item content and metadata to a series of cache files. As a new message arrives, we store the message within our database and spawn a cache file representing the message. When a user modifies a contact's phone number or changes the dates on an event, we update our database and the contents of each item's cache file. When Spotlight indexes Entourage, it is actually indexing the contents of each cache file rather than the Entourage database. This approach allows Spotlight's indexing process to work it's magic on file change notifications, while not requiring a large overhaul of Entourage's data access architecture.
As an Entourage user's database potentially holds years of e-mail messages, the creation of cache files chances consuming large amounts of disk space for essentially redundant data. When we were considering this design, we found that through optimizations such as writing only plain text content rather than HTML and ignoring e-mail attachments, we were able to generate a cache roughly 20% of the original Entourage database's size. We also decided that the feature would be optional, allowing any user to simply disable the creation of the cache within their Entourage preferences.
Figure 1: Spotlight is enabled by default for the first identity opened after update 11.2.3 is installed.
The Spotlight preference pane in Entourage allows the user to both toggle the feature and rebuild the contents of the cache. On a moderately sized database of 200 MB, the creation of the cache file takes only a few minutes and happens in the background. The Rebuild button simply deletes all existing cache files, crawls the Entourage database, and generates a set of new cache files. A user would only need to rebuild if problems arise, as Entourage will continue to create, update, and delete cache files with each action performed on the Entourage database.
Once the cache files are created, Entourage's role in the indexing process is complete. Spotlight chooses when to index the cache files and how the results are displayed in the Spotlight Search Window, Smart Folders, and the Finder's Find functionality. As indexing progresses, the index in Spotlight is updated and queries containing the information the user is searching for begins to show up in the Search window. If you search for a set of words and Spotlight indexes an Entourage mail message with a matching phrase, the message will suddenly appear within the Spotlight Search Results Window. Figure 2 shows the results of a typical Spotlight search with Spotlight enabled in Entourage. The returned Entourage items can include mail, appointment, contact, task, and notes data.
Figure 2: Spotlight search results with Entourage items returned.
Double clicking on a returned item in Spotlight works as expected. You see the Entourage item open, just as if you had clicked on it in Entourage. So what's going on under the covers? Let's use some command line tools and take a look.
Query with Command Line Tools
There are a number of command line utilities that we can use to query the Spotlight database. We can use these tools to see where Entourage is storing the Spotlight metadata that it's creating and we can also see what the metadata files themselves look like.
The first tool to look at is mdfind. mdfind queries the metadata store and returns the results of our query. This tool takes three parameters. The -live parameter will continuously scan the database for results and you'll see items added as they come into Entourage. The -onlyin parameter allows us to specify a particular folder for the search. Finally, the query parameter, a string representing the information that we're searching for. Apple's developer documentation provides more details on the syntax of Spotlight queries. Let's see if we can use this tool to find an Entourage item and see where the metadata is being stored:
Running the command "mdfind welcomee@microsoft.com" on my machine returns a result with the path:
/Users/Brianjo/Library/Caches/Metadata/Microsoft/Entourage/2004/Main
Identity/Messages/0T/0B/0M/0K/1.vRgeMessage
The .vRgeMessage file is an Entourage mail message's cache file. When you perform a Spotlight search, the results always return cache files. As mentioned previously, cache files are merely file-based mirrors of Entourage database records with the metadata and content necessary for Spotlight indexing. The name of the cache file is the record ID for the corresponding database record. When a user opens the cache file from a Spotlight result, Entourage reads the filename, looks up the record ID within the database, and shows the item directly from the database.
The mdls command line utility allows you to see the metadata Spotlight has indexed for any given file. By passing the path to the 1.vRgeMessage cache file from our mdfind result to mdls, we can see Spotlight knows the following about the e-mail message:
/Users/Brianjo/Library/Caches/Metadata/Microsoft/Entourage/2004/Main Identity/Messages/0T/0B/0M/0K/1.vRgeMessage -------------
com_microsoft_entourage_folderID = 1
com_microsoft_entourage_messageSent = 2006-03-21 00:23:21 -0800
com_microsoft_entourage_recordID = 1
com_microsoft_entourage_size = 37783
kMDItemAttributeChangeDate = 2006-03-21 21:55:25 -0800
kMDItemAuthors = ("The Microsoft Mac Team <WelcomeE@microsoft.com>")
kMDItemContentCreationDate = 2006-03-21 00:23:21 -0800
kMDItemContentModificationDate = 2006-03-21 21:55:24 -0800
kMDItemContentType = "com.microsoft.entourage.virtual.message"
kMDItemContentTypeTree = (
"com.microsoft.entourage.virtual.message",
"public.message",
"public.data",
"public.item"
)
kMDItemCoverage = "Inbox"
kMDItemDisplayName = "Welcome to Microsoft Entourage 2004 for Macintosh"
kMDItemFSContentChangeDate = 2006-03-21 21:55:24 -0800
kMDItemFSCreationDate = 2006-03-21 21:55:24 -0800
kMDItemFSCreatorCode = 0
kMDItemFSFinderFlags = 0
kMDItemFSInvisible = 0
kMDItemFSIsExtensionHidden = 0
kMDItemFSLabel = 0
kMDItemFSName = "1.vRgeMessage"
kMDItemFSNodeCount = 0
kMDItemFSOwnerGroupID = 501
kMDItemFSOwnerUserID = 501
kMDItemFSSize = 6584
kMDItemFSTypeCode = 0
kMDItemID = 4306567
kMDItemKind = "Microsoft Entourage message pointer"
kMDItemLastUsedDate = 2006-03-21 00:23:21 -0800
kMDItemRecipients = ("New Microsoft Entourage User ")
kMDItemTitle = "Welcome to Microsoft Entourage 2004 for Macintosh"
Whenever possible, metadata provided by Entourage is designed so that attribute names and values match those used by an analogous Apple application (e.g. message title). We hope that this will allow anyone who builds a solution on top of Spotlight may easily support Entourage alongside Apple's applications. We only deviated by adding additional attributes - nearly all properties available in our AppleScript dictionary are available as metadata. The design is intended so that scripters can use Spotlight as a quick way to query information and round-trip interactions with results through AppleScript. It also makes it possible to create useful queries such as "all unread messages today"( com_microsoft_entourage_unread == 1 && kMDItemContentCreationDate >= $time.today ).
Importing Metadata
Spotlight is designed such that it does not need to know about the file format of each file in order to index the file's contents. Instead, a developer provides a plug-in for Spotlight that handles both reading a file and returning metadata to the Spotlight indexing engine. These plugins are known as Metadata Importers and may be found \Library\Spotlight or are sometimes located within an application's bundle.
Figure 3: The Microsoft Entourage.mdimporter plug-in
The metadata importer plug-in that's used with Entourage is called Microsoft Entourage.mdimporter. When Spotlight comes across an Entourage-generated cache file, Spotlight passes the path of the file to the Entourage metadata importer, the importer reads the file, and then passes the metadata back to Spotlight. You can see the info for this plug-in in Figure 3. Notice that this plug-in is a universal binary and that it runs natively on an Intel-based Mac.
Multiple Identities?
There are a few things that system administrators should understand if they are going to use Spotlight search in multi-user scenarios. Given that Entourage can work as a multi-identity application, one thing you'll probably wonder about is, how does Spotlight know about the currently active identity in Entourage? The answer is that it doesn't. While we only automatically enable Spotlight indexing for the first identity launched after the update is applied, a user may turn on indexing of additional identities by enabling the Spotlight preference in Entourage. If the user then double clicks on an item, that item is only opened if its associated identity is currently active. Entourage actually uses the folder path to determine the identity of a result. If the identity is not currently active then the user will get the message shown in Figure 4.
Figure 4: Trying to open an item associated with an inactive identity
As you can imagine, in scenarios where many people use the same account on a Mac and then differentiate identities in Entourage, Spotlight could become pretty useless when trying to find specific e-mail items as search results intermix results across multiple identities. For that reason, Microsoft recommends that in situations where multiple users will want to use Spotlight search with Entourage, users should have their own user accounts set up on the Macs.
Removing Spotlight from Entourage
There are a number of reasons a system administrator might want to completely disable Spotlight searching in Entourage. First, multiple user accounts on a machine are not always practical. In some cases, schoolrooms use a single account per classroom and kids are able to check their e-mail by simply switching identities in Entourage. On a machine with many dozens of identities, using Spotlight to find anything could be pretty difficult. Second, the cache used for Entourage content does take disk space. In a scenario where a user has a large entourage database, or there are multiple accounts on the machine with large databases, disk space can potentially become an issue. Finally there are privacy considerations around using Spotlight searches on Entourage content, especially if multiple identities are used on the same user account. Even if the searcher can't see the e-mail that's returned, they might be able to get more information than the user wants them to have about items returned in a search.
To completely disable Spotlight in Entourage, simply remove the Microsoft Entourage.mdimporter plug-in from the /Library/Spotlight folder and restart Entourage. When you look at the Entourage preferences you'll see as in Figure 5, that the Spotlight preference is no longer available.
Figure 5: Spotlight removed from Entourage preferences
Finally, if you completely disable Spotlight in entourage, remember to go to the /Users/<username>/Library/Caches/Metadata/Microsoft/Entourage/2004/ folder and delete any folders there that you no longer want to be available in Spotlight searches.
Conclusion
In this article, we've described how Spotlight works with Entourage 2004. The Spotlight search functionality added in update 11.2.3 fundamentally changes the way that you can work with e-mail, contacts, calendar items, and notes in Entourage, allowing you to instantly find data you need using the tools built right into Tiger. This update makes Entourage a true, first class citizen in the OS and it makes working with Entourage data on the Mac easier than ever.
Bibliography and References
Apple Computer. "Working with Spotlight". http://developer.apple.com/macosx/spotlight.html.
M. Amir Haque. "Spotlight Support in Entourage 2004". The Entourage Blog, http://blogs.msdn.com/entourage/archive/2006/03/17/553801.aspx
Brian Johnson is the Microsoft Entourage Product Manager. You can contact Brian by e-mail at brianjo@microsoft.com, or you can read his blog at http://bufferoverrun.net.
Andy Ruff is an Entourage Program Manager, and is the author of The Entourage Blog (http://blogs.msdn.com/Entourage).