Dec 99 Challenge
Volume Number: 15 (1999)
Issue Number: 12
Column Tag: Programmer's Challenge
Programmer's Challenge
by Bob Boonstra, Westford, MA
Costas Arrays
A Costas array of order N is an NxN array of 1s and 0s satisfying two constraints. First, the array must have exactly N 1s and N*(N-1) 0s, with exactly one 1 in each row and column. Second, no two lines between pairs of 1s may have exactly the same length and the same slope. So, for example, there are exactly 12 Costas arrays of order 4:
1000 0001 0010 0010 1000 0001
0010 1000 1000 0100 0001 0100
0001 0010 0100 0001 0100 1000
0100 0100 0001 1000 0010 0010
0100 0100 1000 0001 0100 0010
0010 0001 0100 0010 1000 0001
1000 0010 0001 1000 0010 0100
0001 1000 0010 0100 0001 1000
So why would one care about Costas arrays? Because of the asymmetries imposed by the two constraints, Costas arrays make ideal waveforms for certain sensor applications, reducing ambiguity in interpreting radar and sonar returns. The mathematics are interesting in other ways. For example, according to the CRC Standard Mathematical Tables, the number C(n) of Costas arrays of order n increases as a function of n until n==16, after which it decreases, at least until n==23:
n |
C(n) |
n |
C(n) |
1 |
1 |
2 |
2 |
3 |
4 |
4 |
12 |
5 |
40 |
6 |
116 |
7 |
200 |
8 |
444 |
9 |
760 |
10 |
2160 |
11 |
4368 |
12 |
7852 |
13 |
12828 |
14 |
17252 |
15 |
19612 |
16 |
21104 |
17 |
18276 |
18 |
15096 |
19 |
10240 |
20 |
6464 |
21 |
3536 |
22 |
2052 |
23 |
872 |
24 |
>=1 |
The number of Costas arrays of order 24 and greater is the subject of continuing research. Your Challenge is to aid these researchers by writing efficient code to enumerate Costas arrays.
The prototype for the code you should write is:
#if defined(__cplusplus)
extern "C" {
#endif
long /* number of arrays */ EnumerateCostas(
int n, /* enumerate Costas arrays of order n */
long *costasArrays /* preallocated storage for returning your results */
/* row r of array k is in costasArrays[k*n + r], r,k are origin 0 */
);
#if defined(__cplusplus)
}
#endif
Your EnumerateCostas routine will be asked to enumerate all Costas arrays of order n and return the results in the preallocated costasArrays. Each cell of an array is represented by one bit. The bits for row r of the k-th Costas array are to be returned in costasArrays[k*n + r], r=0..n-1, k>=0. The cell representing column c, c=0..n-1, is the bit in 1<<c. EnumerateCostas must produce all valid Costas arrays, with no duplicates, and return the number of arrays produced.
Testing will be constrained to arrays of order 32 or less. The winner will be the solution that correctly enumerates the Costas arrays in the minimum time.
This will be a native PowerPC Challenge, using the latest CodeWarrior environment. Solutions may be coded in C, C++, Pascal, or Java.
Three Months Ago Winner
Congratulations to our top two leaders in the points contest for finishing first and second in the September Playfair Challenge. Ernst Munter (Kanata, Ontario) took first place with a solution that was by far the fastest, and Tom Saxton took second place.
The Playfair Decipher Challenge asked contestants to decrypt an encoded phrase based on knowing the dictionary of words used in the message and some information about how the encoding is done. Encoding is based on a keyword from the dictionary, which is used to create a 5x5 encoding substitution matrix for the letters of the alphabet. The encoding matrix maps pairs of plaintext letters into pairs of encoded text. Complicating factors include the fact that the letters 'I' and 'J' are encoded as the same character. The encoding technique also inserts separator characters ('X' or 'Z') to separate repeated letters in a pair, in order to prevent double letters from mapping into an easily detected encoded letter pair. For more information on the problem, consult the September issue of MacTech, or check out the online version at <www.mactech.com/progchallenge/>.
Ernst's solution thoroughly analyzes the dictionary during the initialization phase. The SetUpDecocders routine creates a Decoder matrix for each potential keyword in the dictionary, minus duplicate matrices resulting from similar keywords. The initialization of the CodeBreaker structure creates a set of LinkedNodes based on the first three letters of the words in the dictionary. It also registers all letter triplets (trigraphs) that occur at the beginning or the end of a dictionary word. The decoding routine loops through all of the words in the dictionary, trying each of them as a decoding keyword. The trigraphs calculated during initialization are used by the DecodeCipher routine to prune the decoding process. If decoding is successful to this point, the CodeBreaker::Spell routine recursively matches the potentially decoded text to dictionary words. The code is complicated but fast.
Tom Saxton's solution also analyzes the dictionary during initialization, creating a tree structure. It creates the decoding matrix during the decoding process, not during initialization. Tom's word matching algorithm is recursive in concept, but iterative in implementation. Overall, Tom's solution took somewhat more than twice as long as the winning solution.
The third-place solution by Lad (last name unknown) was nearly as fast as the second-place solution. It was unique in that it was submitted as a library rather than as source code. Normally that would have been a disqualification, but since the September Challenge, in keeping with tradition, allowed solutions to be coded in assembly language, I decided to score the results.
The table below lists, for each of the five solutions submitted, the total execution time, code size, data size, and programming language. It also indicates whether a solution completed all of the test cases correctly. As usual, the number in parentheses after the entrant's name is the total number of Challenge points earned in all Challenges prior to this one.
Name |
Time (msec) |
Errors |
Code Size |
Data Size |
Language |
Ernst Munter (507) |
493 |
no |
8052 |
5580 |
C++ |
Tom Saxton (118) |
1059 |
no |
6984 |
443 |
C |
Lad |
1132 |
no |
2796 |
170 |
Unknown |
Rishi Gupta |
57503 |
no |
11844 |
1919 |
C++ |
R. B. |
90773 |
yes |
4828 |
638 |
C++ |
Top Contestants
Listed here are the Top Contestants for the Programmer's Challenge, including everyone who has accumulated 10 or more points during the past two years. The numbers below include points awarded over the 24 most recent contests, including points earned by this month's entrants.
Rank |
Name |
Points |
1. |
Munter, Ernst |
227 |
2. |
Saxton, Tom |
116 |
3. |
Maurer, Sebastian |
70 |
4. |
Boring, Randy |
66 |
5. |
Rieken, Willeke |
51 |
6. |
Heithcock, JG |
39 |
7. |
Shearer, Rob |
34 |
8. |
Brown, Pat |
20 |
9. |
Hostetter, Mat |
20 |
10. |
Mallett, Jeff |
20 |
11. |
Jones, Dennis |
12 |
12. |
Hart, Alan |
11 |
13. |
Hewett, Kevin |
10 |
14. |
Murphy, ACC |
10 |
15. |
Selengut, Jared |
10 |
16. |
Smith, Brad |
10 |
17. |
Strout, Joe |
10 |
18. |
Varilly, Patrick |
10 |
There are three ways to earn points: (1) scoring in the top 5 of any Challenge, (2) being the first person to find a bug in a published winning solution or, (3) being the first person to suggest a Challenge that I use. The points you can win are:
1st place |
20 points |
2nd place |
10 points |
3rd place |
7 points |
4th place |
4 points |
5th place |
2 points |
finding bug |
2 points |
suggesting Challenge |
2 points |
Here is Ernst's winning Playfair solution:
Playfair.cp
Copyright © 1999 Ernst Munter
/*
September 6, 1999.
Submission to MacTech Programmer's Challenge for September 99.
Copyright © 1999, Ernst Munter, Kanata, ON, Canada.
"Playfair Decipher"
Version 2
Problem Statement
---------
Given a dictionary, a cipher text, and the encoding method, break the code and return the deciphered plaintext.
Solution
----
My strategy is to decode the ciphertext with each possible keyword until a plain text results which is accepted by the spell checker.
In InitPlayfair() the dictionary is scanned twice.
First, the set of decoders, one for each keyword, is constructed. Because similar keywords can result in identical coding matrices, redundant matrices are discarded where the keywords are within a certain distance of each other in the dictionary.
Secondly, a spell check tree is built which indexes all words in the dictionary that are distinct in the first 5 characters (stem). Longer words are checked by indexing to the stem, and scanning the group of words which share the same stem directly in the dictionary.
In addition, all trigraphs (sequences of 3 chars) in the dictionary words and digraphs marking start and end of each word are collected.
In DecodePlayfair() we use each of the keywords in turn to decode the cipher text into a tentative plain text. If during the plain text construction any illegal trigraphs are encountered, the decoding breaks off, and the next keyword decoder is selected.
If decoding passes the trigraph check (about 1 in 20 to 1 in 50), the text is submitted to the spell checker which recursively tries to match the text to dictionary words until the whole plain text has passed the check. The spell checker builds a stack of dictionary indices matching the found words. The first successfully checked plain text is returned to the caller by copying the indexed dictionary words into the final plain text.
Complications
-------
There are two sources of complications for the spell check.
The letters 'J' and 'I' are encoded as if they are the same. The decoding step always yields 'I'. The spell checker allows this aliasing with additional tree branches (alias nodes). Nevertheless, there can be true ambiguity which can not be resolved. For example the dictionary words "JON" and "ION" cannot be distinguished.
The encoder inserts 'X' (or 'Z' in some cases) if the two characters of a letterpair are the same ("BLEEP" becomes "BLEXEP". But it does not insert 'X' between pairs, thus "BEEP" remains "BEEP". In addition, there are words which might be ambiguous, e.g. SEES and SEXES, both of which are in the dictionary. The spell checker also provides alias nodes for such cases.
My spell checker is built on the simplifying assumption that no double letters occur in the raw decoded text. 'X' is always expected between double letters. This also improves the efficiency of trigraph checking. To make this assumption true, the decoder inserts 'X' between letter pairs if necessary, and marks any 'X' which occurs as the first character in a letter pair as a "hard" X, because this 'X' is never a filler 'X'.
Limits and Assumptions
-----------
CipherText and the dictionary contain only words using 'A' - 'Z'.
The dictionary is sorted.
The program uses the standard (64K) program stack and allocates a word stack on the heap for up to 400 words in the resulting plain text.
The program uses a recursive function which takes up to 144 bytes on the call stack and recurses 1 level per word found. The standard stack size of 64K is only sufficient for plain text messages of about 300 words.
NOTE: If it is desired to decode longer texts than about 300 words,
Please change the constant "kMaxWords" accordingly and increase
the call stack allocation by 144K for every 1000 words.
The array parameter "char* plainText" in DecodePlayfair() is assumed to be of sufficient size to hold the resulting plain text PLUS an allowance for a slightly larger intermediate raw plain text (50% extra will cover even pathological cases).
Version 2 changes
---------
Fine tuning for speed (15% gain), but no logical changes. Techniques:
- use of unsigned char instead of char to avoid extend-signbit instruction
- reordering of tests in Legal() to favour most frequent case
- review of need of masking of characters. 1 << c, where c is an
uppercase character. This works without mask because shift is modulo 64.
*/
#include "Playfair.h"
#include <string.h>
typedef unsigned char uchar;
typedef unsigned long ulong;
typedef unsigned short ushort;
typedef const char* Cptr;
static const char** dict;
static int numDecoders;
enum {
kMaxWords = 400, // Increase for longer texts, also increase stack
kMaxNodes = 27*32*32,
kDepth = 16
};
enum {
kXFLAG = 1,
kJFLAG = 1<<27,
kZFLAG = 1<<28,
kAll = 0x07FFFFFF,
kStart = 27,
kStartOfWord= 1<<kStart,
kEndOfWord = 1,
kJbit = (1 << (31 & 'J')),
kCharSet = 0x07FFFFFE & ~kJbit
};
struct Rule
// Rule implements the encoding rule in a static look up table
static struct Rule {
short LUT[25][32];
Rule()
{
int spot1=0;
for (int r1=0;r1<5;r1++)
for (int c1=0;c1<5;c1++,spot1++)
{
int spot2=0;
for (int r2=0;r2<5;r2++)
for (int c2=0;c2<5;c2++,spot2++)
{
int vr1,vr2,vc1,vc2;
if (r1==r2)
{
vr1=r1;
if (c1) vc1=c1-1;else vc1=4;
vr2=r2;
if (c2) vc2=c2-1;else vc2=4;
}
else if (c1==c2)
{
vc1=c1;
if (r1) vr1=r1-1;else vr1=4;
vc2=c2;
if (r2) vr2=r2-1;else vr2=4;
}
else
{
vr1=r1;vc1=c2;
vr2=r2;vc2=c1;
}
LUT[spot1][spot2]=
((vr1*5+vc1) << 8)
+(vr2*5+vc2);
}
}
}
} R;
/******* Trigraphs and set of related functions **********/
static ulong trigraph[27][32];
Register
inline void Register(ulong c)
// Single character words
{
for (int i=1;i<=26;i++)
{
trigraph[i][c] |= kEndOfWord;
trigraph[c][i] |= kStartOfWord;
}
}
inline void Register(ulong c1,ulong c2)
// Two character words
{
trigraph[31 & c1][31 & c2] |= kEndOfWord | kStartOfWord;
}
inline void Register(ulong c1,ulong c2,ulong c3)
// Three character words
{
trigraph[31 & c1][31 & c2] |= kStartOfWord | (1 << c3);
trigraph[31 & c2][31 & c3] |= kEndOfWord;
}
inline void Register(ulong c1,ulong c2,ulong c3,ulong c4)
// Four character words
{
trigraph[31 & c1][31 & c2] |= kStartOfWord | (1 << c3);
trigraph[31 & c2][31 & c3] |= 1 << c4;
trigraph[31 & c3][31 & c4] |= kEndOfWord;
}
RegisterHead
inline void RegisterHead(ulong c1,ulong c2,ulong c3,ulong c4,ulong c5)
{
trigraph[31 & c1][31 & c2] |= kStartOfWord | (1 << c3);
trigraph[31 & c2][31 & c3] |= 1 << c4;
trigraph[31 & c3][31 & c4] |= 1 << c5;
}
RegisterTail
inline void RegisterTail(ulong c1,ulong c2,const uchar* w)
{
ulong c3=31 & *w;
while (c3)
{
trigraph[31 & c1][31 & c2] |= 1 << c3;
c1=c2;c2=c3;c3=*++w;
}
trigraph[31 & c1][31 & c2] |= kEndOfWord;
}
RemoveDoubleLetters
static void RemoveDoubleLetters()
// Removes all double letter cases from the trigraphs and replaces
// them with the escape sequences using 'X' or 'Z' as appropriate
{
for (int c1=1;c1<=26;c1++)
{
ulong set=trigraph[c1][c1];
if (0==set) continue;
int sub;
if (c1==(31 & 'X')) sub=31 & 'Z'; else sub=31 & 'X';
trigraph[c1][sub] |=
(set & kStartOfWord) | (1 << c1);
if (set & kEndOfWord)
trigraph[sub][c1] |= kEndOfWord;
trigraph[c1][c1] = 0;
for (int c2=1;c2<=26;c2++)
{
set=trigraph[c1][c2];
if (set & (1 << c2))
{
if (c2==(31 & 'X')) sub=31 & 'Z'; else sub=31 & 'X';
trigraph[c1][c2]=set & (~(1 << c2)) | (1 << sub);
}
}
}
}
LegalStart
inline bool LegalStart(ulong cpair)
{
return (trigraph[31 & (cpair>>8)][31 & cpair] & kStartOfWord);
}
LegalEnd
inline bool LegalEnd(ulong cpair)
{
return (trigraph[31 & (cpair>>8)][31 & cpair] & kEndOfWord);
}
Legal
inline bool Legal(ulong cp1,ulong cp2)
{
return (
(trigraph[31 & (cp1>>8)][31 & cp1] & // ABC.?
(kEndOfWord | (1 << (31 & (cp2 >> 8))))) ||
// note: mask needed here because cp2.hi might be lower case 'x'
// however, it costs nothing here since >>8 and &31 is one instr.
(trigraph[31 & cp1][31 & (cp2 >> 8)] & // ?.BCD
(kStartOfWord | (1 << (cp2)))) ||
(LegalEnd(cp1) && LegalStart(cp2)) // AB.CD
);
}
struct Decoder
static struct Decoder{
uchar matrix[25];
uchar spot[27];
#if KWD
const char* kwd; // useful in debugging
#endif
void Init(const char* keyword)
{
// Sets up the unique matrix and its inverse (=spot) for one keyword
#if KWD
kwd=keyword;
#endif
const char* kp=keyword;
uchar line[25];
uchar* lp=line;
uchar nextC=*kp;
ulong mask=kCharSet; // make sure there is no 'J'
// write unique letters in a line
for (;;)
{
uchar c=nextC;
nextC=*++kp;
if (c==0) break;
if (c=='J') c='I';
int bit=1 << c;
if (mask & bit) {
mask &= ~bit;
*lp++ = c;
}
}
int numUnique=lp-line;
// write remaining letters
for (int c='A';c<='Z';c++)
{
if ((mask & 2)) *lp++=c;
mask >>= 1;
}
// transpose line into matrix
uchar* mp=matrix;
int k=0;
for (int i=0;i<numUnique;i++) {
for (int j=i;j<25;j+=numUnique) {
int c=line[j];
*mp++ = c;
spot[31 & c]=k++;
}
}
}
bool SameMatrix(Decoder* dp)
{
// Compares matrizes to help in eliminating unneeded duplicates
ulong* a=(ulong*)matrix;
ulong* b=(ulong*)(dp->matrix);
if (a[0] != b[0]) return false;
if (a[1] != b[1]) return false;
if (a[2] != b[2]) return false;
if (a[3] != b[3]) return false;
if (a[4] != b[4]) return false;
if (a[5] != b[5]) return false;
return true;
}
ulong Decode(ulong cipherPair)
{
// Decodes one character pair
// Hard 'X' is identified by lower case 'x'
int spot0=spot[31 & (cipherPair>>8)];
int spot1=spot[31 & cipherPair];
int spotPair=R.LUT[spot0][spot1];
uchar c0=matrix[spotPair >> 8];
if (c0=='X') c0='x'; // "hard" X
uchar c1=matrix[31 & spotPair];
return (c0 << 8) | c1;
}
int DecodeCipher(const char* cipherText,char* plainText)
{
// Decodes the entire cipherText into a tentative plaintext
// Inserts 'X' or 'Z' between double letters at pair boundaries
// to accomodate spell checker method
const char* ct=cipherText;
char* pt=plainText;
ulong cipherPair=*((ushort*)ct);ct+=2;
ulong plainPair1=Decode(cipherPair);
if (0==LegalStart(plainPair1)) {
return 0;
}
*((ushort*)pt)=plainPair1;pt+=2;
for (;;ct+=4)
{
ulong cipherWord=*((ulong*)ct);
if ((cipherWord & 0xFF000000) == 0)
break;
ulong plainPair0=Decode(cipherWord >> 16);
if (0==(0x1F & ((plainPair0 >> 8) ^ plainPair1)))
{
if ('X'==(0x5F & plainPair1)) *pt++ = 'Z';
else *pt++ = 'X';
} else if (!Legal(plainPair1,plainPair0))
return 0;
*((ushort*)pt)=plainPair0;pt+=2;
if ((cipherWord & 0x0000FF00) == 0)
break;
plainPair1=Decode(cipherWord);
if (0==(0x1F & ((plainPair1 >> 8) ^ plainPair0)))
{
if ('X'==(0x5F & plainPair0)) *pt++ = 'Z';
else *pt++ = 'X';
} else if (!Legal(plainPair0,plainPair1))
return 0;
*((ushort*)pt)=plainPair1;pt+=2;
}
pt[0]=0;
return pt-plainText;
}
} *D;
// The spell checker tree is a three level hierarchy:
// Node -> List -> Stem -> dictionaryWord
// Nodes are in a 3-dimensional table accessed by the first 3 characters
// A node contains pointers to up to 26 lists ,
// A list contains up to 26 stems ,
// A stem contains a pointer to a dictionary word,
// and the number of words in the group with the same stem.
// Shorter words are referenced by using the 0-index at each level.
struct Stem
struct Stem {
const char** word; // a dictionary word
int numWords; // number of words with common stem
void Add(ulong w)
{
if (word==0) {
word=dict+w;
numWords=1;
}
else numWords++;
}
bool InsertedX(const uchar c,const uchar* pt,int len) const
{
uchar c0=pt[len-1];
if ( (c0==pt[len+1]) &&
( (c=='X') || ((c=='Z') && (c0=='X')) ) )
{
return true;
}
return false;
}
bool JtoI(const uchar c1,const uchar c2) const
{
return ((c1=='J') && (c2=='I'));
}
const char* MatchTail(const uchar* pt,int & len,int numX,
int curWord) const
// Matches string starting at the 5th letter - numX
// numX is the number of inserted X (or Z) in the first 5 letters
{
if (curWord < numWords)
{
const char* wp=word[curWord];
const char* temp=wp+5-numX;
len=5;
uchar c1;
while (0 != (c1 = *temp)) {
uchar c2=pt[len];
if (31 & (c1 ^ c2))
{
if (InsertedX(c2,pt,len))
{
temp-; // skipped X, repeat comparison
} else if (!JtoI(c1,c2))
break;
}
len++;
temp++;
}
if (0==*temp)
return wp;
}
return 0;
}
};
struct LinkedNode
// Lists and Nodes are dynamically allocated.
// They are descended from LinkedNode and maintained in a linked list.
// This is to allow them to be deleted when TermPlayfair() is called.
struct LinkedNode {
LinkedNode* link;
LinkedNode(LinkedNode* lk):link(lk){}
};
struct List
struct List:LinkedNode {
Stem group[27];
List* alias;
ulong xjFlag;
List(LinkedNode* lk,ulong flag):
LinkedNode(lk),alias(0),xjFlag(flag)
{memset(group,0,sizeof(group));}
void Add(ulong c,ulong w){group[c].Add(w);}
const char* GetListLeader() const
{
ulong n=group[0].numWords;
if (n)
return group[0].word[0];
return (const char*)n; //== 0; cast saves 2 instructions!
}
};
struct Node
struct Node:LinkedNode {
List* list[27];
Node* alias;
ulong xjFlag;
Node(LinkedNode* lk,ulong flag):
LinkedNode(lk),alias(0),xjFlag(flag)
{memset(list,0,sizeof(list));}
void Add(ulong c,List* lp){list[c]=lp;}
const char* GetNodeLeader() const
{
List* lp=list[0];
if (lp)
{
if (lp->group[0].numWords){
return lp->group[0].word[0];
}
}
return (const char*)lp; // == 0; cast saves 6 instructions!!
}
};
struct CodeBreaker
static struct CodeBreaker {
LinkedNode* nodeList;
Node* base[kMaxNodes];
Cptr wordStack[kMaxWords];
Cptr* stackPtr;
int cache;
CodeBreaker(const char *words[],long numDictionaryWords)
:nodeList(0),cache(0),stackPtr(wordStack)
{
// The constructor creates the index tree from the dictionary words
dict=words;
SetupDecoders(numDictionaryWords);
memset(base,0,sizeof(base));
for (ulong w=0;w < numDictionaryWords;w++)
{
const char* word=dict[w];
ulong first=*((const ulong*)word);
ulong c0=31 & (first >> 24),
c1=31 & (first >> 16),
c2=31 & (first >> 8),
c3=31 & first,
c4=31 & word[4];
// Change all 'J' to 'I' as we analyse the letters
ulong nodeJflag=0;
if (c0==(31 & 'J'))
{
c0=(31 & 'I');
nodeJflag=kJFLAG;
}
if (c1==0)
{
List* lp=NewList(0,0,w); // only a 1-letter word
Node* np=NewNode(0);
base[32*32*c0]=np;
np->Add(0,lp);
Register(c0);
continue;
}
// At least 2 letters:
ulong nodeXflag=0;
if (c1==(31 & 'J'))
{
c1=(31 & 'I');
nodeJflag |= kJFLAG<<1;
}
if (c0 == c1)
{
c4=c3;c3=c2;c2=c1;
if (c0 == (31 & 'X')) {
c1 = 31 & 'Z';
nodeXflag |= kZFLAG;
} else c1 = 31 & 'X';
nodeXflag |= kXFLAG;
}
if (c2==0)
{
List* lp=NewList(0,0,w); // only a 2-letter word
Node* np=NewNode(0);
base[32*(32*c0+c1)+c2]=np;
np->Add(0,lp);
Register(c0,c1);
continue;
}
// At least 3 letters:
if (c2==(31 & 'J'))
{
c2=(31 & 'I');
nodeJflag |= kJFLAG<<2;
}
if (c1 == c2)
{
c4=c3;c3=c2;
if (c1 == (31 & 'X')) {
c2 = 31 & 'Z';
nodeXflag |= kZFLAG;
} else c2 = 31 & 'X';
nodeXflag |= kXFLAG;
}
Node* np=base[32*(32*c0+c1)+c2];
if (np==0)
base[32*(32*c0+c1)+c2]=
np=NewNode(nodeXflag | nodeJflag);
else np=GetAlias(np,nodeXflag | nodeJflag);
if (c3==0)
{
List* lp=NewList(0);
np->Add(0,lp); // only a 3-letter word
lp->Add(0,w);
Register(c0,c1,c2);
} else
// more than 3 letters
{
ulong listXflag=0,listJflag=0;
ulong numX=1 & nodeXflag;
if (c3==(31 & 'J'))
{
c3=(31 & 'I');
listJflag = kJFLAG;
}
if (c2 == c3)
{
c4=c3;
if (c2 == (31 & 'X')) {
c3 = 31 & 'Z';
listXflag |= kZFLAG;
} else c3 = 31 & 'X';
listXflag |= kAll; // all stems in list affected
numX++;
}
List* lp=np->list[c3];
if (lp==0)
{
lp=NewList(listXflag);
np->Add(c3,lp);
}
if (c4==(31 & 'J'))
c4=(31 & 'I'); // no need for flag; J follows I
if (c3 == c4)
{
if (c3 == (31 & 'X')) {
c4 = 31 & 'Z';
listXflag |= kZFLAG;
} else c4 = 31 & 'X';
listXflag |= 1 << c4; // only [c4] stem affected
numX++;
}
lp=GetAlias(lp,listXflag | listJflag);
lp->Add(c4,w);
// Register all trigraphs from this word and stem
if (c4==0) Register(c0,c1,c2,c3);
else
{
RegisterHead(c0,c1,c2,c3,c4);
RegisterTail(c3,c4,(uchar*)word+5-numX);
}
}
} // end of loop through the dictionary
// Fixup trigraphs
RemoveDoubleLetters();
}
~CodeBreaker()
{
// Destructor releases all dynamically allocated nodes and lists
LinkedNode* np=nodeList;
while(np)
{
LinkedNode* next=np->link;
delete np;
np=next;
}
delete [] D;
}
void SetupDecoders(long numDictionaryWords)
{
// Scans the dictionary and allocates 1 decoder per potential keyword
D=new Decoder[numDictionaryWords];
Decoder* dp=D;
const char** wp=dict;
for (int i=0;i<numDictionaryWords;i++,wp++)
{
dp->Init(*wp);
Decoder* dx=dp-1;
int k=kDepth;
// Scans backwards to discover perhaps identical matrizes, and assigns
// this matrix only if it is unique.
// (Example GUESS and GUESSES yield the same matrix, no need to keep both)
for (;dx>=D;dx-,k-){
if (k<=0) break;
if (dp->SameMatrix(dx))
goto cont;
}
dp++;
cont:;
}
numDecoders=dp-D;
}
// Allocation of lists and nodes, and linkage in chain for memory mgmnt
Node* NewNode(ulong xFlag){
Node* np=new Node(nodeList,xFlag);
nodeList=np;
return np;
}
List* NewList(ulong flag){
List* lp=new List(nodeList,flag);
nodeList=lp;
return lp;
}
List* NewList(int index,ulong flag,int w){
List* lp=NewList(flag);
lp->Add(index,w);
return lp;
}
// Alias nodes and lists are chained off the first node or list
// with the same 3- or 4-letter key.
Node* GetAlias(Node* np,const ulong flag)
{
if (np->xjFlag == flag) return np;
if (np->alias==0)
{
np->alias=NewNode(flag);
return np->alias;
} else return GetAlias(np->alias,flag);
}
List* GetAlias(List* lp,const ulong flag)
{
if (lp->xjFlag == flag) return lp;
if (lp->alias==0)
{
lp->alias=NewList(flag);
return lp->alias;
} else return GetAlias(lp->alias,flag);
}
bool Spell(const uchar* pt,int tailLen,const uchar lastChar,int numWords);
// Recursive, not inlined, defined below.
void Push(const Cptr wp) {*stackPtr++ = wp;}
void CopyStack(char* text)
// Copies the word stack of solution words back into plaintext
// This overwrites the decoded plaintext with dictionary words
// and automatically takes care of I/J equivalents and X/Z fillers.
{
while (stackPtr > wordStack) {
Cptr wp=*-stackPtr;
uchar c=*wp;
while (c) {
*text++=c;
c=*++wp;
}
}
*text=0;
}
void Cache(Decoder* lastUsed)
{
// Builds a cache of recently used decoders by moving lastUsed forward
Decoder* nextCache=D+cache;
if (nextCache < lastUsed)
{
Decoder temp=*nextCache;
*nextCache=*lastUsed;
*lastUsed=temp;
cache++;
}
}
} *C;
// CodeBreaker::Spell
// Returns true if the string in pt[] of length tailLen can be parsed
// into dictionary words, allowing for J/I substitutions and X/Z fillers.
// On each recursion it attempts to match the head of pt[] with a
// dictionary word, and if successful, stacks a pointer to the word
// and calls itself with the shortened string until the end of pt[] is
// reached.
// Note: tailLen and wordlengths refer to decoded plain text and
// words before filler 'X' and 'Z' characters are removed.
bool CodeBreaker::
Spell(const uchar* pt,int tailLen,
const uchar lastChar,int numWords)
{
if (-numWords <=0) // runtime check against stack overflow
{
numWords++;
return false;
}
const char* wp;
// Get Node matching the first 3 characters
ulong pt0123=*((ulong*)pt);
Node* np=base[32*(32*
(31 & (pt0123 >> 24)) +
(31 & (pt0123 >> 16))) +
(31 & (pt0123 >> 8))];
if (np)
{
ulong hardX=pt0123 & 0x20202000;
if (hardX &&
(1 & np->xjFlag) &&
(0==(kZFLAG & np->xjFlag))
) np=np->alias;
while (np) {
// Get list matching the first 4 characters
List* lp=np->list[31 & pt0123];
hardX=0x20 & (31 & pt0123 | pt[4]);
if (hardX && lp
&& (1 & lp->xjFlag)
&& (0==(kZFLAG & lp->xjFlag))
) lp=lp->alias;
while (lp) {
// Get stem matching the first 5 characters
Stem* gp=&lp->group[31 & pt[4]];
if (gp->numWords) {
// Explore all words matching the first 5 characters starting with
// the last word in the group (hoping to catch longer words first)
int curWord=gp->numWords;
int numX = // make this a function
(1 & np->xjFlag) +
(1 & (lp->xjFlag >> pt[4]));
do {
int len;
wp=gp->MatchTail(pt,len,numX,-curWord);
if (wp)
{
if ((len==tailLen)
|| (Spell(pt+len,tailLen-len,pt[len-1],numWords)))
{
Push(wp);
numWords++;
return true;
}
}
} while (curWord > 0);
}
// Try the 4-letter word
wp=lp->GetListLeader();
if (wp)
{
if ((4==tailLen)
|| (Spell(pt+4,tailLen-4,pt[3],numWords)))
{
Push(wp);
numWords++;
return true;
}
}
lp=lp->alias;
}
// No luck so far, try the 3-letter word
wp=np->GetNodeLeader();
if (wp)
{
if ((3==tailLen)
|| (Spell(pt+3,tailLen-3,pt[2],numWords))){
Push(wp);
numWords++;
return true;
}
}
np=np->alias;
} // end of tests where first 3 letters match
}
// Try the 2-letter word
np=base[32*(32*(31 & pt[0])+(31 & pt[1]))+0];
if (np)
{
wp=np->GetNodeLeader();
if (wp)
{
if ((2==tailLen)
|| (Spell(pt+2,tailLen-2,pt[1],numWords))){
Push(wp);
numWords++;
return true;
}
}
}
// Try the 1-letter word
np=base[32*(32*(31 & pt[0])+0)+0];
if (np)
{
wp=np->GetNodeLeader();
if (wp)
{
if ((1==tailLen)
|| (Spell(pt+1,tailLen-1,pt[0],numWords))){
Push(wp);
numWords++;
return true;
}
}
}
// Finally try if this is a filler 'X' or 'Z' between words
if ((pt[0]=='X') ||
((0==(31 & (lastChar ^ 'X'))) && (pt[0]=='Z')))
{
if (tailLen==1){
numWords++;
return true;
}
if (0==(31 & (lastChar ^ pt[1]))
&& Spell(pt+1,tailLen-1,0,numWords))
{
// nothing to stack, just carry on unwinding
numWords++;
return true;
}
}
// Spellcheck failed: no word matches at this offset in plaintext
numWords++;
return false;
}
InitPlayfair
void InitPlayfair(
const char *words[], /* dictionary words */
long numDictionaryWords /* number of null-terminated words in dictionary */
)
{
C = new CodeBreaker(words,numDictionaryWords); // all that's needed
}
DecodePlayfair
void DecodePlayfair(
const char *cipherText, /* null-terminated text to decode */
char *plainText /* null-terminated decoded text */
)
{
// Decodes cipherText into plainText.
// If no solution is found, plainText will be a gibberish string
// of the same length as cipherText, or possibly a bit longer.
if ((0==*cipherText) || (strlen(cipherText) & 1))
// Cannot handle empty or odd-length cipher text strings
return;
Decoder* dp=D;
Decoder* dEnd=D+numDecoders;
for (;dp<dEnd;dp++)
{
int len=dp->DecodeCipher(cipherText,plainText);
if (len && C->Spell((uchar*)plainText,len,0,kMaxWords))
{
C->CopyStack(plainText);
C->Cache(dp);
break;
}
}
}
TermPlayfair
void TermPlayfair(void)
{
delete C;
}