Jul 96 Factory Floor
Volume Number: | | 12
|
Issue Number: | | 7
|
Column Tag: | | From The Factory Floor
|
A Little CodeWarrior History
By Dave Mark
This month, were going to talk with John McEnerney, one of the compiler writers
at Metrowerks.
Dave: How did you first hook up with Metrowerks?
John: I first met Greg Galanos when I was the development manager at Symantecs Language Products Group. Greg was trying to get me interested in doing some sort of deal with the fledgling Metrowerks, and I mostly ignored him because they were trying to compete aggressively with my first product, THINK Pascal. I would have never guessed that a few years later he would offer me the best opportunity of my career.
Dave: When did you leave Symantec?
John: I left Symantec in October 92, taking about 6 months off to figure out what I wanted to do next. I didnt have any real plans, but I figured Id find some way to do PowerPC work. I didnt relish the thought of trying to write an entire C++ compiler, so I considered doing a Pascal product on my own.
Around this time, Greg had heard from Rich Siegel (of BBEdit fame) that I was no longer at Symantec, and he called me right away. The first thing he said to me was, describe your dream job, and I told him I wanted to write a PowerPC code generator for the upcoming Power Macintoshes. I flew to Montreal to meet him and his partner, Jean Belanger. We had some Italian food, drank some wine, and they told me a little about their Pascal and Modula products. I was really hot to write a PowerPC backend, but I was not that impressed with their technology. We talked about various contracts, but I didnt have a really solid feeling yet.
Dave: What finally convinced you to go with Metrowerks?
John: In February 93, Greg asked me to meet with him in Palo Alto to get a look at a C compiler that they had just acquired; a guy named Andreas Hommel in Hamburg had been writing it as a hobby. It ran on the Macintosh, had a simple but nice IDE reminiscent of early versions of THINK C, and it was fast. I spent about an hour looking through the source code: it was well organized, the compiler front-end and back-end were cleanly separated, the code was easy to follow, and in addition to being a full ANSI C compiler, it had a lot of the C++ language implemented already.
It was clear that Greg had found a diamond in the rough, the perfect platform for a native PowerMacintosh product. A few hours later we had a contract - I had about 6-8 months to write a PowerPC back-end and linker. Andreas would finish the C++ language implementation, and a few guys in Montreal (Berardino Baratta, Marcel Achim) would work on the IDE and a new Pascal front-end. We immediately hired Greg Dow, who had written the THINK Class Library for Symantec when I was there, to write a new application framework: PowerPlant.
We must have hooked up with Jordan Mattson from Apple around this time, because a week or so later he sent me one of their RS/6000s to help me get started. Between him and Alan Lillich, who I had met at all the early PowerPC meetings that Apple had been holding for their key developers, I got pretty much everything I needed from Apple.
So, I now had a contract to do the most interesting work I could imagine; all I had to do was figure out where to start.
Dave: What was it like working with Andreas compiler?
John: Andreas compiler was pretty traditional in its organization. The front-end made a single pass over the source code, performing lexical analysis as it went, and generated an intermediate representation (IR) that consisted of expression trees, labels, and branches. It took about a week to totally remove the 68K code generator from the rest of the compiler, and put in stub routines where the front-end and the back-end connected so that everything would still link. If I could fill in all the stub routines in exactly the right way, wed have a PowerPC compiler.
The first thing I did was write a routine that dumped the IR in human-readable form - I dont know how Andreas got his 68K code generator to work without that, I guess he can keep more in his head than I can. Looking at the expression trees on the screen allowed me to visualize how the code generator would proceed.
Most CISC compilers spend a lot of time working on the IR trees themselves. Traditional global optimizations like loop-invariant code motion or common subexpression elimination are performed by rewriting the IR trees into optimized IR trees. The code generator gathers information about the shape of the trees, deciding how many registers will be needed, which addressing modes will be used, etc. After instructions are generated they are largely ignored except for small peephole optimizations. (A notable exception to this is the gcc compiler, which transforms the expressions into a simple algebraic representation called RTL and uses repeated peephole optimizations derived from a machine description to coalesce these RTL expressions into complex instructions and addressing modes.)
Most of the RISC compilers that Id read about in the compiler literature used a different approach: immediately transform the IR trees into a low-level representation that was similar or identical to the actual RISC instructions of the target machine, and perform all optimizations at the machine instruction level. I decided to use this technique in my PowerPC code generator.
Dave: What was your basic approach to code generation?
John: Strange as it may seem, the first part of the code generator I actually wrote was the instruction scheduler - the phase that reorders instructions to minimize latencies caused by load delays, and to permit floating-point and integer instructions to execute in parallel. I needed to know if my low-level representation - I called it a pcode (no relation to the UCSD Pascal pcode) - had enough information for all the phases I would eventually write, and since the scheduler needed a lot of information, it would serve to prove the design of the pcode. Of course, I had to rewrite the scheduler twice more: the first time was to fix the original one, which had some design flaws, and the second time was to make it more general to support 601, 603 and 604 CPUs.
Once I finished the scheduler, I had my data structures organized and all the support routines in place, so I started writing the instruction selection phase - the guts of the code generator. This phase visits the IR tree and generates pcode. It does try to recognize certain tree patterns, like opportunities for FMADD and FMSUB routines, but since there are no complex addressing modes and very few complex instructions, it is mostly a straightforward translation to PowerPC instructions.
To get short-term results, I wrote a quick-and-dirty register allocator, and some code to display the generated pcode instructions, and was able to get most of the code generation debugged this way. I decided to use a proprietary object code format, derived from the one we were already using in our 68K compiler and linker, since I could get this working faster than trying to write an XCOFF linker. I spent a few weeks getting the linker working, finished the part of the code generator that wrote the object file, and I could actually compile and link small programs.
Dave: How did the debugger fit into all this?
John: Around August 93 the project was falling into place, but we still didnt have a source-level debugger. In a most serendipitous event, Dan Podwall, a friend of mine from Symantec, called and asked whether there were any opportunities at Metrowerks. Greg Galanos called him right away, hired him on the phone, and 4 weeks later he had written a debugger - in PowerPlant, no less - that could single-step and set breakpoints. This would be the first commercial PowerPlant program - in fact, the first PowerPlant program of any kind aside from Greg Dows demos.
Dave: How did you build the compiler?
John: By September 93 we had some prototype PowerPC hardware, and I had a working code generator and linker which ran on the 68K Macintosh and generated PEF executables that ran on the prototypes. Since this compiler was already built using our own 68K compiler, it was pretty easy to rehost it on the Power Macintosh: we made the changes for the Universal Headers and routine descriptors and such, then compiled it with itself on the 68K machine, which gave us (after some debugging!) a working PowerPC-hosted PowerPC compiler. With a little bit of trickery, mandated by differences between PowerPC floating-point hardware and the 68K SANE software floating-point architecture, we were able to rehost the 68K compiler on the Power Macintosh as well. We now had the fastest compilers on the Macintosh.
Dave: What next?
John: I still had a lot of work to do on the PowerPC code generator. The biggest task was to replace the quick-and-dirty register allocator with a graph coloring-based allocator. This is one of the great algorithms in the history of compilers. For years people had been trying to come up with an accurate way to represent the lifetimes of variables, so that variables or temporaries that did not overlap could share a register. A lot of ad hoc techniques were developed, but this guy from IBM Watson Research Center named Greg Chaitin discovered a formal approach that solved the problem better than anything that had been previously attempted: build an interference graph which has an edge between any two variables whose values may be live at the same time, and then try to color this graph with N colors where N is the number of available registers.
So my code generator assumes it has an infinite number of virtual registers, and generates the most efficient code it can under that assumption; for example, it assumes that all local variables, arguments, and TOC pointers can be assigned to a register. After the code is all generated, the register allocator tries to rewrite the virtual registers using real PowerPC registers, and generates extra code to spill values that couldnt get a real register. In most cases, everything gets a register since there are so many on the PowerPC. The smarter register allocator probably makes the overall largest contribution to code quality.
The algorithm has one drawback: it has O(N^2) complexity. There are actually programs which have so many intermediate expressions that the interference graph gets too large and it takes several minutes to color it. So I had to keep around the quick-and-dirty allocator as well, which is why youll sometimes get an annoying message that says the code generator ran out of registers if youre compiling without global optimizations.
Dave: And so, CodeWarrior was born!
John: By December I had pretty much everything working. After a last-minute dash to get C++ language support working on the PowerPC, we were able to burn our first public release, DR/1, starting a long Metrowerks tradition of getting things in under the wire and never missing a ship date. We introduced the product at the San Francisco Macworld Expo with our huge 8-page MacWeek advertisement, and CodeWarrior was born.
There were plenty of things to be cleaned up between DR/1 and DR/3, which was our real 1.0 release. But by shipping DR/1 and DR/2 when we did, and by working closely with a lot of the major Macintosh software vendors, we were able to help a lot of companies get their software ported to the PowerMac that otherwise might not have.
For me, I had accomplished what I had wanted to when I was back at Symantec: building the PowerPC compiler that most users would use to port their code to the new Power Macintoshes. And Greg had kept his promise and given me my dream job.