TweetFollow Us on Twitter

Sep 00 Online

Volume Number: 16 (2000)
Issue Number: 9
Column Tag: MacTech Online

PDF and XML

by Jeff Clites <online@mactech.com>

Last month we covered Adobe's Portable Document Format (PDF), focusing on how it relates to Quartz, Apple's new imaging model. In brief, PDF originated as a simplification of PostScript, retaining PostScript's primitive graphics operators while discarding its programming-language constructs and adding file and document structure specifications. Quartz (specifically, the Core Graphics Rendering API) is again based on this same set of operators, making it natural to "record" graphics operations into a PDF file, and just as natural to "play back" a PDF into a series of native drawing instructions. At its simplest, PDF is the new PICT; more interestingly, the Quartz imaging model is at the center of all 2-D graphics on Mac OS X, providing a centralized facility for rending drawing commands from different APIs (such as QuickDraw) into different output formats, be they destined for the screen, a printer, or a file.

Of course, part of the beauty of Quartz is that it frees the programmer from having to worry about the details of this process. At the same time, Quartz is certain to increase the popularity of PDF, and in particular expand its use beyond just a format for traditional documents. Accordingly, it will be to a programmer's advantage to know as much as possible about PDF, and to be aware of its strengths and its weaknesses.

As touched on above, PDF defines a file format in addition to a graphics model. In the abstract, a PDF file describes a tree of objects, with a significant separation between document content and document layout. This should send off bells in a developer's head, because it sounds similar to XML, and it's natural to wonder how deep this connection is-to ask questions like, "can a PDF document be represented in XML." The short answer is "probably not", but it's interesting to investigate the parallels between the two formats.

Intersections with XML

PDF and XML are similar in that they define a file structure which is designed to encapsulate a wide range of data in a fairly generalized, hierarchical fashion. Although PDF is designed to be extensible, it does define an interpretation for the information it contains, and it's not clear how well current PDF-rendering applications would handle PDF documents with content which they don't recognize. XML is at the other extreme. At its core, it says nothing about the semantics of the data which it can contain, and it's often used as a format for information which isn't naturally thought of as a "document." But given its generality, it would certainly be possible to devise an XML-based format to encapsulate page-descriptions in a manner similar to PDF. On the other hand, there are several facilities of PDF which are not easily mimicked using XML-features dealing more with practical performance issues than with conceptual structure.

PDF was designed to be a final format, so that PDFs represent finished documents, rather than in-progress works (such as word-processing documents) which will be extensively changed. Still, it is possible to make limited modifications to PDFs, and interestingly this can be done by appending the "change" information to the end of a PDF, without requiring the entire document to be rewritten. This makes it convenient to prepare an initial document and at a later stage add annotations or hyperlinks. This approach also provides a measure of safety, as previous versions of a document can be recovered simply by truncating the changes off the end, and modifications cannot cause complete corruption of the base document. This also means that it is possible to modify large documents without large resource requirements.

Despite XML's flexibility, it isn't possible to create a well-formed XML document by appending information directly to another document, because of the requirement that there be a single root element. (It is possible to work around this limitation, but only by splitting the document into multiple files.) Additionally, PDF documents frequently encapsulate binary data (such as images or compressed text), and it is not convenient to embed such data into XML documents directly-XML is a text-based format, and binary data could be interpreted as markup, or mangled if the document is converted to a different character encoding. XML-based formats traditionally handle this by storing the data in a separate file which is then referenced from the base document, just as images are included in HTML files. This is less convenient than PDF's single-file approach. (It would be possible to include binary data in XML documents by converting it into a text-based representation, such as Base-64 encoding, but this tends to offset the benefits of compression.) Finally, PDF has a higher structural flexibility, in that logical containment is not always represented by physical containment. In other words, structures which logically contain other objects may do so by referencing the objects by name, whereas in XML such containment is almost always represented by physically nesting elements. This flexibility allows the same PDF to be represented in different ways, so that for example a PDF file may be optimized for page-at-a-time delivery over the internet, or alternatively it could be created in a single-pass by a printer driver.

FOP

So despite the current popularity of XML, it isn't likely that PDF is going to be superceded any time soon. So where do PDF and XML intersect? Well, as we observed before, it's natural to think of XML as unformatted data, and to think of PDF as an output format. The preferred way to get from XML to something with formatting is by way of XSL Transformations (XSLT). In the case of XML-to-PDF transformation, there's a tool to help with the process, FOP. (It's part of the Apache XML project.) To use FOP, you first use an XSLT processor to convert your XML document into a tree of formatting objects, which may itself be represented as an XML document. This is where you determine the form of your final document. Since, as mentioned above, XML documents are traditionally devoid of formatting information and are often viewed as pure data, any decisions about how this information will be presented must be encapsulated in the style sheet. Once this is done, and you have your tree of formatting objects, you feed this into FOP, which produces your final PDF. FOP is very much a work in progress, and does not yet support all of the formatting objects defined in the XSL specification, but even as-is it appears quite useful. IBM has an informative tutorial on transforming XML documents. (A free registration is required to access the tutorial.) It discusses using FOP to create PDF documents, and in addition shows you how to generate SVG (Scalable Vector Graphics), which is useful for creating things like charts and graphs from XML-encapsulated data.

OmniPDF

Finally, while you're playing with PDF, be sure to check out OmniPDF if you are running Mac OS X. It's a very cool PDF viewer. It's still under development, but it's Cocoa-native (and hence Mac-OS-X-native), and it really shows off the power of Quartz, as it uses Core Graphics Rendering to do its magic. (OmniPDF is from the Omni Group, who also created OmniWeb, which is currently the only Cocoa-native web browser available. You should check it out also-it's a refreshing alternative, and it has many fun features which set it apart from your usual browser choices.)

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

Top Mobile Game Discounts
Every day, we pick out a curated list of the best mobile discounts on the App Store and post them here. This list won't be comprehensive, but it every game on it is recommended. Feel free to check out the coverage we did on them in the links... | Read more »
Price of Glory unleashes its 1.4 Alpha u...
As much as we all probably dislike Maths as a subject, we do have to hand it to geometry for giving us the good old Hexgrid, home of some of the best strategy games. One such example, Price of Glory, has dropped its 1.4 Alpha update, stocked full... | Read more »
The SLC 2025 kicks off this month to cro...
Ever since the Solo Leveling: Arise Championship 2025 was announced, I have been looking forward to it. The promotional clip they released a month or two back showed crowds going absolutely nuts for the previous competitions, so imagine the... | Read more »
Dive into some early Magicpunk fun as Cr...
Excellent news for fans of steampunk and magic; the Precursor Test for Magicpunk MMORPG Crystal of Atlan opens today. This rather fancy way of saying beta test will remain open until March 5th and is available for PC - boo - and Android devices -... | Read more »
Prepare to get your mind melted as Evang...
If you are a fan of sci-fi shooters and incredibly weird, mind-bending anime series, then you are in for a treat, as Goddess of Victory: Nikke is gearing up for its second collaboration with Evangelion. We were also treated to an upcoming... | Read more »
Square Enix gives with one hand and slap...
We have something of a mixed bag coming over from Square Enix HQ today. Two of their mobile games are revelling in life with new events keeping them alive, whilst another has been thrown onto the ever-growing discard pile Square is building. I... | Read more »
Let the world burn as you have some fest...
It is time to leave the world burning once again as you take a much-needed break from that whole “hero” lark and enjoy some celebrations in Genshin Impact. Version 5.4, Moonlight Amidst Dreams, will see you in Inazuma to attend the Mikawa Flower... | Read more »
Full Moon Over the Abyssal Sea lands on...
Aether Gazer has announced its latest major update, and it is one of the loveliest event names I have ever heard. Full Moon Over the Abyssal Sea is an amazing name, and it comes loaded with two side stories, a new S-grade Modifier, and some fancy... | Read more »
Open your own eatery for all the forest...
Very important question; when you read the title Zoo Restaurant, do you also immediately think of running a restaurant in which you cook Zoo animals as the course? I will just assume yes. Anyway, come June 23rd we will all be able to start up our... | Read more »
Crystal of Atlan opens registration for...
Nuverse was prominently featured in the last month for all the wrong reasons with the USA TikTok debacle, but now it is putting all that behind it and preparing for the Crystal of Atlan beta test. Taking place between February 18th and March 5th,... | Read more »

Price Scanner via MacPrices.net

AT&T is offering a 65% discount on the ne...
AT&T is offering the new iPhone 16e for up to 65% off their monthly finance fee with 36-months of service. No trade-in is required. Discount is applied via monthly bill credits over the 36 month... Read more
Use this code to get a free iPhone 13 at Visi...
For a limited time, use code SWEETDEAL to get a free 128GB iPhone 13 Visible, Verizon’s low-cost wireless cell service, Visible. Deal is valid when you purchase the Visible+ annual plan. Free... Read more
M4 Mac minis on sale for $50-$80 off MSRP at...
B&H Photo has M4 Mac minis in stock and on sale right now for $50 to $80 off Apple’s MSRP, each including free 1-2 day shipping to most US addresses: – M4 Mac mini (16GB/256GB): $549, $50 off... Read more
Buy an iPhone 16 at Boost Mobile and get one...
Boost Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering one year of free Unlimited service with the purchase of any iPhone 16. Purchase the iPhone at standard MSRP, and then choose... Read more
Get an iPhone 15 for only $299 at Boost Mobil...
Boost Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering the 128GB iPhone 15 for $299.99 including service with their Unlimited Premium plan (50GB of premium data, $60/month), or $20... Read more
Unreal Mobile is offering $100 off any new iP...
Unreal Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering a $100 discount on any new iPhone with service. This includes new iPhone 16 models as well as iPhone 15, 14, 13, and SE... Read more
Apple drops prices on clearance iPhone 14 mod...
With today’s introduction of the new iPhone 16e, Apple has discontinued the iPhone 14, 14 Pro, and SE. In response, Apple has dropped prices on unlocked, Certified Refurbished, iPhone 14 models to a... Read more
B&H has 16-inch M4 Max MacBook Pros on sa...
B&H Photo is offering a $360-$410 discount on new 16-inch MacBook Pros with M4 Max CPUs right now. B&H offers free 1-2 day shipping to most US addresses: – 16″ M4 Max MacBook Pro (36GB/1TB/... Read more
Amazon is offering a $100 discount on the M4...
Amazon has the M4 Pro Mac mini discounted $100 off MSRP right now. Shipping is free. Their price is the lowest currently available for this popular mini: – Mac mini M4 Pro (24GB/512GB): $1299, $100... Read more
B&H continues to offer $150-$220 discount...
B&H Photo has 14-inch M4 MacBook Pros on sale for $150-$220 off MSRP. B&H offers free 1-2 day shipping to most US addresses: – 14″ M4 MacBook Pro (16GB/512GB): $1449, $150 off MSRP – 14″ M4... Read more

Jobs Board

All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.