Oct 99 Challenge
Volume Number: 15 (1999)
Issue Number: 10
Column Tag: Programmer's Challenge
Programmer's Challenge
by Bob Boonstra, Westford, MA
SuperDuperGhost
This month we're going to play a word game. One derived from a game called GHOST. The basic concept is simple - players spell a word, taking turns adding letters to the growing word, trying to avoid being the player who says the last letter of the word.
To start, one player says a letter. The next player thinks of (but does not reveal) a word that begins with the given letter, and then announces the first two letters of that word. The next player thinks of a (possibly different) word that starts with the first two letters, and announces the first three letters of that word. Play continues until a player spells an entire word that is more than three letters long. The player who completes the word loses the round. If a player spells a string that is not part of a real word, that player loses the round.
A game with three players might go like this:
Adam: [thinking of "toast"]: T
Betsy: [thinking of "tadpole"]: TA
Cynthia: [thinking of "tatting"]: TAT [which with three letters does not count as a word]
Adam: [now thinking of "tattoo," since "tadpole" no longer fits the current string of letters]: TATT
Betsy: [also thinking of "tattoo"]: TATTO
Cynthia: [has two options: finish the word "tattoo" and lose this round, or spell a nonexistent word, and also lose. She resigns]
In our game, there are a few variations. Instead of always adding a letter to the end of the string, we'll also play games where you can only add to the beginning, games where you can add to the beginning or to the end, and games where you can add a letter anywhere. Our games will also be restricted to two players, with each Challenge contestant competing against each other contestant.
The prototype for the code you should write is:
#if defined(__cplusplus)
extern "C" {
#endif
typedef enum {
addToEndOnly = 0,
addToBeginningOnly,
addToBeginningOrEnd,
addAnywhere
} GameType;
void InitSuperDuperGhost(
const char *dictWords[],/* alphabetically sorted uppercase dictionary words */
long numDictionaryWords /* number of null-terminated words in dictionary */
);
void NewGhostGame(
GameType theGameType
);
void PlayGhost(
const char *ghostString, /* the string so far, null-terminated */
char newGhostString[256], /* new ghostString, one letter added to ghostString */
int *wordInMindIndex,
/* your string will match dictWords[wordInMindIndex] */
int charPositions[256],
/* index into dictWords[wordInMindIndex] for each char in newGhostString */
);
void TermSuperDuperGhost(void);
#if defined(__cplusplus)
}
#endif
The vocabulary for our game consists of numDictionaryWords alphabetically sorted uppercase words provided to your InitSuperDuperGhost routine in the dictWords parameter. The dictionary will typically consist of 40000-50000 words but will never exceed 100000 words. All dictWords will be greater than 3 letters in length. InitSuperDuperGhost should analyze the dictionary and create intermediate tables if appropriate. At the end of the contest, TermSuperDuperGhost will be called, where you should deallocate any dynamic memory allocated by InitSuperDuperGhost.
Each round will consist of each Challenge entry competing against each other entry in a game of each GameType, once playing first and once playing second. Each player will be notified of the start of a new game with a call to NewGhostGame. At each turn, a player's PlayGhost routine will be called. The null-terminated ghostString parameter will provide the string played thus far in the game, which is guaranteed to be part of some word in dictWords. The PlayGhost routine must add a single character to the ghostString, at the beginning, the end, or anywhere, depending on theGameType parameter to this game, and return the result in newGhostString. PlayGhost must return in wordInMindIndex the index into dictWords of the word matched by newGhostString. To help the test program evaluate your move, you must also set the charPositions array, so that:
newGhostString[i] =
dictWords[wordInMindIndex][charPositions[i]]
The winner of each game wins 100 points. Each player's score is reduced by 1 point for each 10 milliseconds of execution time, including the time taken by the initialization and termination routines. The Challenge winner will be the player with the most points at the end of the tournament.
If you cannot add a letter to the ghostString without forming a word, you should return a null string in newGhostString. Alternatively, if you form a word in dictWords but return a non-null string, you will be penalized 50 points.
Here's how it works: each month we present a new programming challenge. First, write some code that solves the challenge. Second, optimize your code (a lot). Then, submit your solution to MacTech Magazine. We choose a winner based on code correctness, speed, size, and elegance (in that order of importance) as well as the submission date. In the event of multiple equally desirable solutions, we'll choose one winner (with honorable mention, but no prize, given to the runner up). The prize for each month's best solution is a $100 credit for DevDepot[TM].
Unless stated otherwise in the problem statement, the following rules apply: All solutions must be in ANSI compatible C or C++, or in Pascal. We disqualify entries with any assembly in them (except for challenges specifically stating otherwise.) You may call any Macintosh Toolbox routine (e.g., it doesn't matter if you use NewPtr instead of malloc). We compile all entries into native PowerPC code with compiler options set to enable all available speed optimizations. The development environment to be used for selecting the winner will be stated in the problem. Limit your code to 60 characters per line or compress and binhex the solution; this helps with e-mail gateways and page layout.
We publish the solution and winners for each month's Programmer's Challenge three months later. All submissions must be received by the 1st day of the month printed on the front cover of this issue.
You can get a head start on the Challenge by reading the Programmer's Challenge mailing list. It will be posted to the list on or before the 12th of the preceding month. To join, send an email to listserv@listmail.xplain.com with the subject "subscribe challenge-A".
Mark solutions "Attn: Programmer's Challenge Solution" and send it by e-mail to one of the Programmer's Challenge addresses in the "How to Communicate With Us" section on page 2 of this issue. Include the solution, all related files, and your contact info.
MacTech Magazine reserves the right to publish any solution entered in the Programmer's Challenge. Authors grant MacTech Magazine the exclusive right to publish entries without limitation upon submission of each entry. Authors retain copyrights for the code.
This month's Challenge was suggested by JG Heithcock, who wins 2 Challenge points for the suggestion. If you have an idea that you think would make a good Challenge problem, send it to <progchallenge@mactech.com> for possible consideration in a future Challenge.
This will be a native PowerPC Challenge, using the CodeWarrior Pro 5 environment. Solutions may be coded in C, C++, or Pascal. Solutions in Java will also be accepted this month. Java entries must be accompanied by a test driver that uses the interface provided in the problem statement.
Three Months Ago Winner
It seemed like a simple Challenge when I wrote it. The July C-to-HTML Challenge, that is. Write some code to convert a valid C/C++ file into HTML, so that the .html file displays in Netscape Navigator the same way the .c, .cp, or.h file is displayed by the CodeWarrior (Pro 4) IDE. A simple matter of syntax coloring, right?
Well, yes, but it wasn't quite that simple. First, there is the question of #pragma and other preprocessor directives. The list of which directives CodeWarrior recognizes requires a little research and experimentation. Second, the treatment of #pragmas and other proprocessor directives is something less than consistent in CodeWarrior Pro 4. For example, consider the following directives:
#ident
#pragma options align=mac68k
#pragma ANSI_strict
#pragma register_coloring
__declspec(export)
__declspec(import) // notice import is not hilighted
Of that list, "#ident" is recognized but not colored, "options" is colored but "align=mac68k" is not, "ANSI_strict" (and most other #pragma directives) is colored, but "register_coloring" (and another set of directives) is not. The "export" argument in a __declspec command is highlighted, but "import" is not. Go figure.
Then there was the question of line continuations. The CodeWarrior IDE doesn't color an include file as a string when it is written on one line:
#include "stdlib.h"
... but it does color it if the #include is broken into two lines with a line continuation character.
#include \
"stdlib.h"
And proprocessor directives broken with line continuations are colored based on what the fragment looks like. In the following examples, "if" is colored in both places, but the rest of the directive is not. Go figure some more.
#if\
def K
#def\
ine L
#end\
if
Other inputs that tripped some people up:
#include "Cto\
HTML.h" /* note the unmatched quotes (") here, one is black, the other gray!*/
long q='//st';
long r='/*..';
long s='..*/';
The proprocessor coloring confusion probably turned off some potential contestants, but, if that were not enough, there was the issue of proportional fonts. The original problem statement required that tab characters be processed correctly, including spacing when using proportional fonts. That turns out to be rather difficult in html, so, after some discussion on the Challenge mailing list, I decided to test with monospaced fonts only.
Speaking of the mailing list, if you are not already on it, I would encourage you to join. Not only does it give you more time to solve the Challenge (problems are sent out on or around the 12th of the preceding month), but you can also tune in to any problem clarifications. See <www.mactech.com/progchallenge> for subscription instructions.
So, after that long introduction, congratulations ONCE AGAIN to Ernst Munter (Kanata, Ontario) for submitting the fastest and most correct solution to the C-to-HTML Challenge. This is Ernst's third Challenge win in a row - is there no one out there ready to mount a serious Challenge to his outstanding record?! Ernst and two of the other participants took advantage of the <SPAN> HTML tag and other Cascading Style Sheet properties in converting code for display in a browser. Ernst parses the text using a state machine and six tables, defined in the HTMLcodes structure, based on whether the state corresponds to leading white space, intervening white space, preprocessor lines, strings, C-style comments, or C++-style comments. Text coloring is done using style sheets. Ernst's code converts the familiar "Hello World" program from:
/*
* Hello World for the CodeWarrior
* © 1997-1998 Metrowerks Corp.
*
* Questions and comments to:
* <mailto:support@metrowerks.com>
* <http://www.metrowerks.com/>
*/
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main(void)
{
printf ("Hello World, this is CodeWarrior!\n\n");
return 0;
}
... into:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head><STYLE TYPE="text/css">
<!-
pre { color:x000000; font: 9pt monaco}#p{color:black}#c{color:xcc0000}#k{color:x0000cc}#s{color:x777777}->
</STYLE></head><body bgcolor="#FFFFFF">
<pre>
<SPAN ID="c">/*
* Hello World for the CodeWarrior
* © 1997-1998 Metrowerks Corp.
*
* Questions and comments to:
* <mailto:support@metrowerks.com>
* <http://www.metrowerks.com/>
*/</SPAN>
<SPAN ID="k">#include</SPAN> <stdio.h>
<SPAN ID="k">#include</SPAN> <stdlib.h>
<SPAN ID="k">#include</SPAN> <time.h>
<SPAN ID="k">int</SPAN> main(<SPAN ID="k">void</SPAN>)
{
printf (<SPAN ID="s">"Hello World, this is CodeWarrior!\n\n"</SPAN>);
<SPAN ID="k">return</SPAN> 0;
}
</pre></body></html>
Fortunately, the quirks of the IDE did not affect the results, as the solutions ranked the same when ordered by execution time as they did by correctness. The table below lists, for each of the solutions submitted, the total execution time, a correctness rank, and the size and language code parameters. As usual, the number in parentheses after the entrant's name is the total number of Challenge points earned in all Challenges prior to this one.
Name |
Time (msec) |
Correctness |
Code Size |
Data Size |
Lang |
Ernst Munter (487) |
22.34 |
1 |
4128 |
6888 |
C++ |
Dennis Jones (12) |
41.76 |
2 |
4228 |
7393 |
C |
Yudhi Widyatama |
71.40 |
3 |
4292 |
5955 |
C++ |
Alan Hart (21) |
1863.09 |
4 |
4788 |
4099 |
C |
Top Contestants
Listed here are the Top Contestants for the Programmer's Challenge, including everyone who has accumulated 10 or more points during the past two years. The numbers below include points awarded over the 24 most recent contests, including points earned by this month's entrants.
Rank |
Name |
Points |
1. |
Munter, Ernst |
221 |
2. |
Saxton, Tom |
108 |
3. |
Boring, Randy |
73 |
4. |
Maurer, Sebastian |
70 |
5. |
Rieken, Willeke |
51 |
6. |
Heithcock, JG |
39 |
7. |
Lewis, Peter |
26 |
8. |
Jones, Dennis |
22 |
9. |
Brown, Pat |
20 |
10. |
Hostetter, Mat |
20 |
11. |
Mallett, Jeff |
20 |
12. |
Nicolle, Ludovic |
20 |
13. |
Murphy, ACC |
14 |
14. |
Shearer, Rob |
14 |
15. |
Hart, Alan |
11 |
16. |
Hewett, Kevin |
10 |
17. |
Selengut, Jared |
10 |
18. |
Smith, Brad |
10 |
There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use. The points you can win are:
1st place |
20 points |
2nd place |
10 points |
3rd place |
7 points |
4th place |
4 points |
5th place |
2 points |
finding bug |
2 points |
suggesting Challenge |
2 points |
19. |
Varilly, Patrick |
10 |
Here is Ernst's winning C-toHTML solution:
c2html.cp
Copyright © 1999
Ernst Munter, Kanata, ON, Canada
/*
Submission to MacTech Programmer's challenge for July 99.
Copyright © 1999, Ernst Munter, Kanata, ON, Canada.
"C-to-HTML"
Problem Statement
---------
Convert a syntactically correct C or C++ text file into an HTML file such that Netscape 4.6 displays the text in the same way as the Codewarrior Pro-4 editor.
Settings provide dynamic control of font and colors.
Issues
---
There are three kinds of issues
- CW colorization of keywords and strings does not always match the semantics of the text, for example not all preprocessor keywords are displayed in keyword color, and comments and line continuations affect keyword and string colors in preprocessor statements. Although this behaviour of the CW editor is inconsistent with the behavior of the compiler, it is predictable, and should be emulated here.
- HTML provides no direct way to provide tab stops (beyond line indentation). Only monospaced fonts display properly when tabs are replaced with equivalent numbers of space characters.
- The Apple character set contains a number of characters not available in HTML fonts. Creation and insertion of small images might solve this problem, but I consider this beyond the scope of this Programmers Challenge.
Solution
----
The <pre> and <span> tags, together with a few CSS styles are used to control the font and colors in the HTML output.
The parsing of the input text character by character is based on a table driven state machine. Six tables, largely identical but differing in a few key areas, are pre-built for the six major states which the text can be in:
- Text and white space after a newline
- Text after the first (not #) character on a line
- Preprocessor lines
- Strings (except in include and pragma preprocessor lines)
- C style comments
- C++ style comments
A loop in TextConverter::ConvertText() then simply looks up each input character in the current table, and processes it. This involves scanning ahead to find keywords, and, as triggered by context (table) specific characters, the selection of a different table.
Tabs are expanded using a small table of tabstops.
Display colors are controlled by enclosing non-black text with pairs of <SPAN ID="x"> and </SPAN> tags, where x is the name in the style sheet for the desired color.
Assumptions
------
Lines are not more than 255 character positions long.
Tab stops beyond position 255 are ignored.
All C/C++ files use Mac line ends (0x0D).
Please turn
#define MULTI_PLATFORM
ON if DOS type files types are to be processed.
The character 0x1A (end-of-text) will be so interpreted.
Compiler Settings (Please note)
---------
The shortest and fastest code is obtained by setting the following compiler optimizations:
- global optimization: level 4, faster execution speed
- Processor: Schedule instructions, Peephole optimization
- Language: Inline depth = 5 or more
With these settings, the program should be about 4136 bytes of code and about 8K of data.
*/
#define MULTI_PLATFORM 0
#include <cstdio>
#include "CtoHTML.h"
#include "keywords.h"
#include "HTMLstrings.h"
OpenPage
static char* OpenPage(
char* outputHTML,
const Settings displaySettings)
{
// Sends the preamble of the HTML page.
return outputHTML +
std::sprintf(outputHTML,
"%s %dpt %s}#p{color:black}#c{color:x%06x}#k{color:x%06x}\
#s{color:x%06x}%s",
kHead1,
displaySettings.fontSize,
displaySettings.fontName,
displaySettings.commentColor,
displaySettings.keywordColor,
displaySettings.stringColor,
kHead2 );
}
ClosePage
static char* ClosePage(char* outputHTML)
{
// Closes the HTML page.
// Not strictly necessary but good form.
return outputHTML +
std::sprintf(outputHTML,"%s",kTail);
}
static unsigned char gTabs[256];
static char spaces[260] =
" "
" "
" "
" "
" ";
MakeTabsArray
static void MakeTabsArray(int tabSize)
{
if (tabSize<1) tabSize=1;
tabSize &= 0xFF;
int c=tabSize;
for (int i=0;i<256;i++)
{
gTabs[i]=c-;
if (c<=0) c=tabSize;
}
}
enum { // state of HTML output
kHTMLplain,
kHTMLcomment,
kHTMLkeyword,
kHTMLstring
};
#if macintosh
// Not a very nice technique, but it works and is efficient
// Each 4 characters can be specified and copied as a long.
static long ospan1='<SPA';
static long ospan2='N ID';
static char ospan3='=';
static long ospan4[4] = {'"p">','"c">','"k">','"s">'};
static long cspan1='</SP';
static long cspan2='AN> ';
#else
// conventional platform independent C code
static char* kOpenSpan[] = {
"<SPAN ID=\"p\">",
"<SPAN ID=\"c\">",
"<SPAN ID=\"k\">",
"<SPAN ID=\"s\">"
};
static char* kCloseSpan = "</SPAN>";
static int kCloseSpanLength = strlen(kCloseSpan);
static int kOpenSpanLength = strlen(kOpenSpan[0]);
#endif
// Make keyword lookup functions available
static Keywords K;
struct TextConverter:HTMLcodes
// TextConverter inherits context tables from HTMLcodes
// It is simply a collection off most functions so they are
// easily inlined.
static struct TextConverter:HTMLcodes {
void CloseSpan(char* & outputHTML)
{
// Returns the HTML state to plain (black letters)
#if macintosh
*((long*)outputHTML) = cspan1;
*((long*)(outputHTML+4)) = cspan2;
outputHTML += 7;
#else
strcpy(outputHTML,kCloseSpan);
outputHTML += kCloseSpanLength;
#endif
}
void OpenSpan(char* & outputHTML,const int newHTMLstate)
{
// Sets the HTML state to the desired color state
#if macintosh
*((long*)outputHTML) = ospan1;
*((long*)(outputHTML+4)) = ospan2;
*((char*)(outputHTML+8)) = ospan3;
*((long*)(outputHTML+9)) = ospan4[newHTMLstate];
outputHTML += 13;
#else
strcpy(outputHTML,kOpenSpan[newHTMLstate]);
outputHTML += kOpenSpanLength;
#endif
}
bool IsLiteral(const char* inputText)
{
// Checks whether a '"' is part of a character literal
return
(inputText[1] == '\'') &&
(
(inputText[-1] == '\'') ||
(
(inputText[-1] == '\\') &&
(inputText[-2] == '\'')
)
);
}
void SendNewline(
const char c,
const char* & inputText,
char* & outputHTML)
{
// on entry, c='\n' and *inputText is the next character
// returns with *inputText = next unprocessed character
#if MULTI_PLATFORM
// this enables it for Mac, UNIX and DOS newlines
if (7==(c ^ inputText[1])) // skip 0xA after 0xD
++inputText;
#endif
*outputHTML++ = c;
}
void SendTab(char* & outputHTML,int & lpos)
{
int numSpaces=gTabs[0xFF & lpos];
lpos += numSpaces;
for (int i=0;i<numSpaces;i++)
*outputHTML++ = ' ';
}
void SendPlain(
char c,
const char* & inputText,
int len,
char* & outputHTML,
int & lpos)
{
// Plain text is neither a keyword, nor contains any trigger
// characters, nor characters that need to be HTML encoded.
*outputHTML++ = c;lpos++;
for (int i=1;i<len;i++)
{
c=*++inputText;
if (c=='\t')
SendTab(outputHTML,lpos);
else {*outputHTML++ = c;lpos++;}
}
}
void SendKeywords(
char c,
const char* & inputText,
int len,
char* & outputHTML,
int & lpos)
{
// Sends one or more consecutive keywords in keyword color.
OpenSpan(outputHTML,kHTMLkeyword);
*outputHTML++ = c;lpos++;
for (int i=1;i<len;i++)
{
c=*++inputText;
if (c=='\t')
SendTab(outputHTML,lpos);
else {*outputHTML++ = c;lpos++;}
}
CloseSpan(outputHTML);
}
int GetToken(const char* inputText)
// Returns the length of a word, a potential keyword
{
const char* cp=inputText-1;
while (cmap[(int)(unsigned char)(*++cp)] & word)
/* impossible to avoid compiler warning here */ ;
// (no #pragma will do it)
return cp-inputText;
}
int PragmaValue(const char* inputText)
// Returns the length of the whitespace + keyword, or 0
{
char c;
int len;
const char* cp=inputText;
while (0 != (c = *++cp)) {
if (0 == (cmap[(int)(unsigned char)(c)] & wsp))
{
len=GetToken(cp);
if (len)
{
if (K.IsPragmaValue(cp,len))
return len + cp-inputText;
}
break;
}
}
return 0;
}
int PragmaDirective(const char* inputText)
// returns the length of the whitespace + keywords, or 0
{
char c;
int len;
const char* cp=inputText;
while (0 != (c = *++cp)) {
if (0 == (cmap[(int)(unsigned char)(c)] & wsp))
{
len=GetToken(cp);
if (len)
{
if (K.IsPragmaDirective(cp,len))
return PragmaValue(cp+len)
+ cp + len - inputText;
}
break;
}
}
return 0;
}
int Preprocessor(const char* inputText,int & PPstate)
{
// Scans ahead to determine the preprocessor keywords.
// Returns the length of the string from the initial # to
// the end of the last colored keyword (or 0 if none found).
// Emulates the CW Pro-4 editor bug where legal comments
// between # and the keywords spoil colorization of the
// preprocessor keywords.
char c; // skip '#'
int len;
const char* cp=inputText;
while (0 != (c = *++cp)) {
if (0 == (cmap[(int)(unsigned char)(c)] & wsp))
{
len=GetToken(cp);
if (len)
{
PPstate =
K.IsPreprocessorKeyword(cp,len);
switch (PPstate)
{
case kPPspoiled: return 0;
case kPPsimple:
case kPPinclude: return cp + len - inputText;
case kPPpragma: return PragmaDirective(cp+len)
+ cp + len - inputText;
}
}
break;
}
}
return 0;
}
int GetPlaintext(const char* inputText)
{
// Scans ahead to determine a section of text which may be
// whitespace and punctuation, but no strings, literals,
// comments, or newlines, nor encodable chars (<&>).
// Returns the length of the section.
char c;
const char* cp=inputText-1;
while (0 != (c = *++cp)) {
if (0 ==
(cmap[(int)(unsigned char)(c)] & nokey))
break;
}
return cp-inputText;
}
int GetKeywords(const char* inputText,int & plain)
{
// Scans ahead to determine a sequence of basic keywords.
// Returns the length of the string to the end of the last
// colored keyword (or 0 if none found).
// The variable plain provides the number of characters
// after the last recognized keyword, including a non-keyword
// and all white space and punctuation up to (excluding) the
// next: string, literal, comment, or newline
char c;
int len,acc = 0;
plain = 0;
const char* cp=inputText-1;
while (0 != (c = *++cp)) {
if (0 == (cmap[(int)(unsigned char)(c)] & wsp))
{
len=GetToken(cp);
if (len && K.IsBasicKeyword(cp,len))
{
cp += len;
acc = cp - inputText;
cp-;
}
else
{
plain = len + GetPlaintext(cp+len);
break;
}
}
}
return acc;
}
char* ConvertText(
const char* inputText,
char *outputHTML)
// Loops over all input characters, and determines text
// states such as
// - start of a new line of plain text
// - preprocessor activation
// - plain text after the first pinting character
// - "strings"
// - /* C style comments */
// - // C++ style comments
// - 'literals' (no separate table)
//
// Each character is looked up in a table associated with
// the current state. These tables provide the triggers
// for state changes. When a state change occurs, the
// appropriate table is assigned.
//
// Potential keywords are detected by scanning ahead when
// this makes sense, so colorization can be applied for
// these keywords without backtracking in the output.
{
char c;
-inputText;
const unsigned char* table=startIndex;
const unsigned char* returnTable=startIndex;
int lpos=0,PPstate,plain,len,quoteCounter;
for (;;) {
c=*++inputText;
evaluate:
int code = table[(int)(unsigned char)(c)];
if (code==0)
{*outputHTML++ = c;lpos++;}
else switch(code) {
case kTab:
SendTab(outputHTML,lpos);
break;
case kNewlineCommentPlus:
if (*(inputText-1) != '\\')
{
CloseSpan(outputHTML);
table = startIndex;
}
goto send_newline;
case kPPnewline:
if (quoteCounter & 1) // unmatched quote
{
OpenSpan(outputHTML,kHTMLstring);
table = stringIndex;
goto send_newline;
}
//else table = startIndex;
//goto send_newline;
case kNewlineText:
table = startIndex;
send_newline:
case kNewline:
case kNewlineStart:
SendNewline(c,inputText,outputHTML);
lpos=0;
break;
case kCommentReturn:// from a C-style comment
*outputHTML++ = c;lpos++;
if (inputText[1] == '/')
{
*outputHTML++ = *++inputText;
table = returnTable;
CloseSpan(outputHTML);
}
break;
case kStart2Text:// first non-whitespace, not #
table=textIndex;
goto evaluate;
case kStartPreprocessor:// valid #, probably PreProcessor
len = Preprocessor(inputText,PPstate);
if (len)
{
SendKeywords
(c,inputText,len,outputHTML,lpos);
if (PPstate == kPPsimple)
// just regular text from here
table = textIndex;
else
{ // PP deals with the whole line
quoteCounter = 0;
table = preprocIndex;
}
break;
}
// not a PP keyword after all
table = textIndex;
goto print_one;
case kPPstring: // a string argument in preproc
// does not get string colored
if (quoteCounter & 1)
{
if (*(inputText-1) != '\\')
quoteCounter = 0;
} else quoteCounter = 1;
goto print_one;
case kBasicKeyword:
len=GetKeywords(inputText,plain);
if (len)
{
SendKeywords
(c,inputText,len,outputHTML,lpos);
if (plain)
c = *++inputText;
else break;
}
if (plain)
{
SendPlain
(c,inputText,plain,outputHTML,lpos);
break;
}
goto print_one;
case kGotoComment: // either type might start
if (inputText[1] == '*')
{
OpenSpan(outputHTML,kHTMLcomment);
returnTable = table;
table = commentIndex;
}
else if (inputText[1] == '/')
{
OpenSpan(outputHTML,kHTMLcomment);
table = commentPlusIndex;
}
// not a comment, but we still must send char
goto print_one;
case kText2String: // normal start of a string
if ((lpos==0) || !IsLiteral(inputText))
{
OpenSpan(outputHTML,kHTMLstring);
table = stringIndex;
}
goto print_one;
case kEscapeString: // handles \" sequence in string
if (inputText[1] == '"')
{
*outputHTML++ = c;
*outputHTML++ = *++inputText;lpos+=2;
break;
}
print_one:
*outputHTML++ = c;lpos++;
break;
case kString2Text: // return from string to normal text
*outputHTML++ = c;lpos++;
if (*(inputText-1) != '\\')
{
table = textIndex;
CloseSpan(outputHTML);
}
break;
case kEndoftext: // the terminating 0 was found
return outputHTML;
default: // character needs encoding for HTML
if (table == startIndex)
table = textIndex;
outputHTML =
SendCode(code,outputHTML);
lpos++;
}// end of switch
}
return outputHTML;
}
} textConverter;
CtoHTML
//The entry from the test program:
long /* output length */ CtoHTML(
const char *inputText, /* text to convert */
char *outputHTML, /* converted text */
const Settings displaySettings /* display parameters */
) {
char* start = outputHTML;
outputHTML = OpenPage(outputHTML,displaySettings);
MakeTabsArray(displaySettings.tabSize);
outputHTML =
textConverter.ConvertText(inputText,outputHTML);
outputHTML = ClosePage(outputHTML);
*outputHTML=0;
return outputHTML-start;
}
Keyword.cp
/*
List of strings which appear colored as "keywords" in the CodeWarrior IDE 3.3.
Compiled by Ernst Munter, 22 June 1999.
This list may be incomplete.
*/
#include "keywords.h"
const char* _Keyword[] = {"\1\11",
"_declspec"};
const char* aKeyword[] = {"\4\2\5\2\3",
"nd","nd_eq","sm","uto"};
const char* bKeyword[] = {"\4\5\4\3\4",
"itand","itor","ool","reak"};
const char* cKeyword[] = {"\10\3\4\3\4\4\4\11\7",
"ase","atch","har","lass","ompl","onst","onst_cast","ontinue"};
const char* dKeyword[] = {"\5\6\5\1\5\13",
"efault","elete","o","ouble","ynamic_cast"};
const char* eKeyword[] = {"\5\3\3\7\5\5",
"lse","num","xplicit","xport","xtern"};
const char* fKeyword[] = {"\5\4\2\4\2\5",
"alse","ar","loat","or","riend"};
const char* gKeyword[] = {"\1\3",
"oto"};
const char* hKeyword[] = {"\0"};
const char* iKeyword[] = {"\3\1\5\2",
"f","nline","nt"};
const char* jKeyword[] = {"\0"};
const char* kKeyword[] = {"\0"};
const char* lKeyword[] = {"\1\3",
"ong"};
const char* mKeyword[] = {"\1\6",
"utable"};
const char* nKeyword[] = {"\4\10\2\2\5",
"amespace","ew","ot","ot_eq"};
const char* oKeyword[] = {"\3\7\1\4",
"perator","r","r_eq"};
const char* pKeyword[] = {"\4\5\6\10\5",
"ascal","rivate","rotected","ublic"};
const char* qKeyword[] = {"\0"};
const char* rKeyword[] = {"\3\7\17\5",
"egister","einterpret_cast","eturn"};
const char* sKeyword[] = {"\7\4\5\5\5\12\5\5",
"hort","igned","izeof","tatic","tatic_cast","truct","witch"};
const char* tKeyword[] = {"\10\7\3\4\3\2\6\5\7",
"emplate","his","hrow","rue","ry","ypedef","ypeid","ypename"};
const char* uKeyword[] = {"\3\4\7\4",
"nion","nsigned","sing"};
const char* vKeyword[] = {"\3\6\3\7",
"irtual","oid","olatile"};
const char* wKeyword[] = {"\2\6\4",
"char_t","hile"};
const char* xKeyword[] = {"\2\2\5",
"or","or_eq"};
const char* yKeyword[] = {"\0"};
const char* zKeyword[] = {"\0"};
const char** basicTable[] = {
_Keyword,0,
aKeyword,bKeyword,cKeyword,dKeyword,eKeyword,
fKeyword,gKeyword,hKeyword,iKeyword,jKeyword,
kKeyword,lKeyword,mKeyword,nKeyword,oKeyword,
pKeyword,qKeyword,rKeyword,sKeyword,tKeyword,
uKeyword,vKeyword,wKeyword,xKeyword,yKeyword,
zKeyword
};
const char* aPPword[] = {"\0"};
const char* bPPword[] = {"\0"};
const char* cPPword[] = {"\0"};
const char* dPPword[] = {"\1\5","efine"};
const char* ePPword[] = {"\4\3\3\4\4",
"lif","lse","ndif","rror"};
const char* fPPword[] = {"\0"};
const char* gPPword[] = {"\0"};
const char* hPPword[] = {"\0"};
const char* iPPword[] = {"\4\1\4\5\6",
"f","fdef","fndef","nclude"};
const char* jPPword[] = {"\0"};
const char* kPPword[] = {"\0"};
const char* lPPword[] = {"\1\3","ine"};
const char* mPPword[] = {"\0"};
const char* nPPword[] = {"\0"};
const char* oPPword[] = {"\0"};
const char* pPPword[] = {"\1\5","ragma"};
const char* qPPword[] = {"\0"};
const char* rPPword[] = {"\0"};
const char* sPPword[] = {"\0"};
const char* tPPword[] = {"\0"};
const char* uPPword[] = {"\1\4","ndef"};
const char* vPPword[] = {"\0"};
const char* wPPword[] = {"\1\6","arning"};
const char* xPPword[] = {"\0"};
const char* yPPword[] = {"\0"};
const char* zPPword[] = {"\0"};
const char** preprocessorTable[] = {
aPPword,bPPword,cPPword,dPPword,ePPword,
fPPword,gPPword,hPPword,iPPword,jPPword,
kPPword,lPPword,mPPword,nPPword,oPPword,
pPPword,qPPword,rPPword,sPPword,tPPword,
uPPword,vPPword,wPPword,xPPword,yPPword,
zPPword
};
char* pragmaDirective[kNumPragma] = {
// list of 110 pragmas which the CW IDE 3.3 displays in keyword color
"a6frames",
"align_array_members",
"always_inline",
"ANSI_strict",
"arg_dep_lookup",
"ARM_conform",
"auto_inline",
"bool",
"check_header_flags",
"code_seg",
"code68020",
"code68881",
"cplusplus",
"cpp_extensions",
"d0_pointers",
"data_seg",
"def_inherited",
"direct_destruction",
"direct_to_som",
"disable_registers",
"dollar_identifiers",
"dont_inline",
"dont_reuse_strings",
"ecplusplus",
"enumsalwaysint",
"exceptions",
"export",
"extended_errorcheck",
"far_code",
"far_data",
"far_strings",
"far_vtables",
"faster_pch_gen",
"force_active",
"fourbyteints",
"fp_contract",
"fp_pilot_traps",
"function",
"global_optimizer",
"IEEEdoubles",
"ignore_oldstyle",
"import",
"init_seg",
"inline_depth",
"internal",
"k63d",
"k63d_calls",
"lib_export",
"longlong",
"longlong_enums",
"macsbug",
"mark",
"mmx",
"mpwc",
"mpwc_newline",
"mpwc_relax",
"near_code",
"no_register_coloring",
"oldstyle_symbols",
"once",
"only_std_keywords",
"optimization_level",
"optimize_for_size",
"optimization_level",
"options",
"pack",
"parameter",
"pcrelstrings",
"peephole",
"pointers_in_A0",
"pointers_in_D0",
"pool_strings",
"pop",
"precompile_target",
"profile",
"push",
"readonly_strings",
"require_prototypes",
"RTTI",
"scheduling",
"segment",
"side_effects",
"smart_code",
"SOMCallOptimization",
"SOMCallStyle",
"SOMCheckEnvironment",
"SOMClassVersion",
"SOMMetaClass",
"SOMReleaseOrder",
"stack_cleanup",
"static_inlines",
"sym",
"syspath_once",
"toc_data",
"traceback",
"trigraphs",
"unsigned_char",
"unused",
"warn_emptydecl",
"warn_extracomma",
"warn_hidevirtual",
"warn_illpragma",
"warn_implicitconv",
"warn_notinlined",
"warn_possunwant",
"warn_unusedarg",
"warn_unusedvar",
"warning",
"warning_errors",
"wchar_type"
};
const char* pragmaValue[kNumPragmaValue] = {
// When used in a colorized pragma, these words are also colored
"list",
"on",
"off",
"reset"
};
Keyword.h
#include <cstring>
#include "c2htmlCommon.h"
extern unsigned char cmap[];
extern const char** basicTable[];
extern const char* pragmaValue[];
extern char* pragmaDirective[];
extern const char** preprocessorTable[];
struct Keywords
struct Keywords {
// Collection of functions for looking up keywords in tables
bool IsBasicKeyword(const char* w,int len)
{
const char c = *w++;
-len;
if (cmap[(int)(unsigned char)(c)] & lowIdent)
{
const char** table=basicTable[c-'_'];
const char* desc=*table;
int numWords=*desc;
for (int i=1;i<=numWords;i++)
{
int kwLen=*++desc;
const char* kw=*++table;
if (kwLen != len)
continue;
if (0 == std::strncmp(w,kw,len))
return true;
}
}
return false;
}
int IsPreprocessorKeyword(const char* w,int len)
// returns:
// kPPspoiled if not a keyword
// kPPsimple if simple
// kPPinclude if include
// kPPpragma if pragma
{
const char c = *w++;
-len;
if (cmap[(int)(unsigned char)(c)] & low)
{
const char** table=preprocessorTable[c-'a'];
const char* desc=*table;
int numWords=*desc;
for (int i=1;i<=numWords;i++)
{
int kwLen=*++desc;
const char* kw=*++table;
if ((kwLen == len) && (0 == std::strncmp(w,kw,len)))
{
if ((c=='i') && (i==4))
return kPPinclude;
if (c=='p')
return kPPpragma;
return kPPsimple;
}
}
}
return kPPspoiled;
}
bool IsPragmaDirective(const char* w,int len)
{
const char c = *w++;
-len;
const char** pdh = pragmaDirective;
for (int i=0;i<kNumPragma;i++,pdh++)
{
const char* pdp = *pdh;
if ( (c == *pdp) && (0 == std::strncmp(w,1+pdp,len)) )
return true;
}
return false;
}
bool IsPragmaValue(const char* w,int len)
{
const char c = *w++;
-len;
const char** pvh = pragmaValue;
for (int i=0;i<kNumPragmaValue;i++,pvh++)
{
const char* pvp = *pvh;
if ( (c == *pvp) && (0 == std::strncmp(w,1+pvp,len)) )
return true;
}
return false;
}
};
myCType.cp
#include "c2htmlCommon.h"
unsigned char cmap[256] = {
0, 0, 0, 0, 0, 0, 0, 0, 0,wsp,nln, 0,nln, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
wsp,pun,oth,pun,pun,pun,oth,oth,pun,pun,pun,pun,pun,pun,pun,oth,
dig,dig,dig,dig,dig,dig,dig,dig,dig,dig,pun,pun,oth,pun,oth,pun,
pun,upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,
upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,upp,pun,pun,pun,pun,und,
pun,low,low,low,low,low,low,low,low,low,low,low,low,low,low,low,
low,low,low,low,low,low,low,low,low,low,low,pun,pun,pun,pun,pun,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
};
HTMLstrings.cp
// Canned HTML header and trailer
const char* kHead1 = "\
<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0//EN\"\n\
\"http://www.w3.org/TR/REC-html40/loose.dtd\">\n\
<html><head>\
<STYLE TYPE=\"text/css\">\n\
<!-\npre { color:x000000; font:";
const char* kHead2 = "->\n\
</STYLE>\
</head>\
<body bgcolor=\"#FFFFFF\">\n\
<pre>\n";
const char* kTail = "\n</pre></body></html>\n";
// The HTML encodings & < > and for extended Apple characters.
// Apple characters without HTML equivalents become ?
const char* encoding =
"000;\
038;060;062;\
196;197;199;201;209;214;220;225;\
224;226;228;227;229;231;233;232;\
234;235;237;236;238;239;241;243;\
242;244;246;245;250;249;251;252;\
134;176;162;163;167;149;182;223;\
174;169;153;180;168;063;198;216;\
063;177;063;063;165;181;063;063;\
063;063;063;170;186;063;230;248;\
191;161;172;063;131;063;063;171;\
187;133;032;192;194;213;140;156;\
150;151;147;148;145;146;063;063;\
255;159;047;063;139;155;063;063;\
135;183;130;132;137;194;202;193;\
203;200;205;206;207;204;211;212;\
063;210;218;219;217;063;136;152;\
063;063;063;063;063;063;063;063;";
HTMLstrings.h
#include "c2htmlCommon.h"
extern const char* kHead1;
extern const char* kHead2;
extern const char* kOpenSpan[4];
extern const char* kCloseSpan;
extern const int kCloseSpanLength;
extern const int kOpenSpanLength;
extern const char* kTail;
extern const char* encoding;
inline char* SendCode(int code,char* output)
{
const char* cp = encoding + code*4;
output[0] = '&';
output[1] = '#';
#if macintosh
*((long*)(output+2)) = *((long*)(cp));
#else
output[2] = cp[0];
output[3] = cp[1];
output[4] = cp[2];
output[5] = cp[3];
#endif
return output+6;
}
struct HTMLcodes
struct HTMLcodes {
unsigned char commonIndex[256];
unsigned char startIndex[256];
unsigned char textIndex[256];
unsigned char stringIndex[256];
unsigned char commentIndex[256];
unsigned char commentPlusIndex[256];
unsigned char preprocIndex[256];
HTMLcodes()
{
// constructor sets up the code tables at program start
MakeCommonIndex();
MakeStartIndex(startIndex);
MakeTextIndex(textIndex);
MakeStringIndex(stringIndex);
MakeCommentIndex(commentIndex);
MakeCommentPlusIndex(commentPlusIndex);
MakePreprocIndex(preprocIndex);
}
void MakeCommonIndex()
{
std::memset(commonIndex,0,sizeof(commonIndex));
commonIndex[0] =kEndoftext;
commonIndex[0x09]=kTab;
commonIndex[0x0A]=kNewline;
commonIndex[0x0D]=kNewline;
commonIndex[0x1A]=kEndoftext;// DOS files!
commonIndex['&']=kAmp;
commonIndex['<']=kLT;
commonIndex['>']=kGT;
for (int i=128;i<256;i++)
{
commonIndex[i] = (i-128) + kExtended;
}
}
void MakeStartIndex(unsigned char* x)
{
std::memcpy(x,commonIndex,sizeof(commonIndex));
x[0x0A]=kNewlineStart;
x[0x0D]=kNewlineStart;
x['#'] =kStartPreprocessor;
for (int c=' '+1;c<128;c++)
if (x[c]==0)
x[c]=kStart2Text;
}
void MakeTextIndex(unsigned char* x)
{
std::memcpy(x,commonIndex,sizeof(commonIndex));
x[0x0A] = kNewlineText;
x[0x0D] = kNewlineText;
for (int c='_';c<='z';c++)
x[c]= kBasicKeyword;
x['"'] = kText2String;
x['/'] = kGotoComment;
}
void MakeStringIndex(unsigned char* x)
{
std::memcpy(x,commonIndex,sizeof(commonIndex));
x['\\'] = kEscapeString;
x['"'] = kString2Text;
}
void MakeCommentIndex(unsigned char* x)
{
std::memcpy(x,commonIndex,sizeof(commonIndex));
x['*'] = kCommentReturn;
}
void MakeCommentPlusIndex(unsigned char* x)
{
std::memcpy(x,commonIndex,sizeof(commonIndex));
x[0x0A] = kNewlineCommentPlus;
x[0x0D] = kNewlineCommentPlus;
}
void MakePreprocIndex(unsigned char* x)
{
std::memcpy(x,commonIndex,sizeof(commonIndex));
x[0x0A] = kPPnewline;
x[0x0D] = kPPnewline;
x['"'] = kPPstring;
x['/'] = kGotoComment;
}
};
c2htmlCommon.h
#ifndef C2HTMLCOMMON
#define C2HTMLCOMMON
enum {
kPPspoiled,
kPPsimple,
kPPpragma,
kPPinclude
};
enum {
kNumBasic = 76,
kNumPreprocessor = 13,
kNumPragma = 110,
kNumPragmaValue = 4
};
enum {
low = 1,
upp = 2,
und = 4,
dig = 8,
wsp = 16,
nln = 32,
pun = 64,
oth = 128,
lowIdent = (low | und),
ident = (low | upp | und),
word = (ident | dig),
txt = (word | wsp | pun),
nokey = (upp | dig | wsp | pun)
};
enum {
kAmp = 1,
kLT = 2,
kGT = 3,
kExtended = 4,
kTab = kExtended + 128,
kNewline,
// codes in Start state
kNewlineStart,
kStart2Text,
kStartPreprocessor,
// codes in Text state
kNewlineText,
kBasicKeyword,
kText2String,
kGotoComment, // also from the preprocessor states
// codes in String state
kEscapeString,
kString2Text,
// codes in Comment state
kCommentReturn,
kNewlineCommentPlus,
// codes in Preprocessor state
kPPstring,
kPPnewline,
kEndoftext
};
#endif