TweetFollow Us on Twitter

XCMD Import Text
Volume Number:4
Issue Number:10
Column Tag:HyperChat®

XCMD Corner

By Donald Koscheka, Apple Computers, Inc.

Importing Text into Hypercard

A new controversy seems to be emerging in the Hypercard community. Some Hypercard pundits are discouraging the use of XCMDs and XFCNs in stack design.. Their most convincing argument is that those of us who jump into writing XCMDs aren’t giving ourselves an opportunity to see if HyperTalk can perform the task, perhaps equally as well as an XCMD.

I frequently consider writing an XCMD solution to a programming problem without first considering whether Hypertalk can do the same job for me. Recently, I needed to import Microsoft WORD files into Hypercard. What a wonderful opportunity to write an XCMD!

When I sat down to write the script to invoke the XCMD, I realized that I could write the entire WORD import routine in HyperTalk. Ed Wischmeyer of Apple Computer Inc. pointed out that although fields in HyperTalk prefer to see straight ASCII text, there is no such restriction on the contents of containers. Hypercard also allows you to open and read any file type you want; you aren’t restricted to reading text files. Of course, you need to figure out how to translate what’s in that container into a format that can be presented in a field.

The hard part of importing text from a Word file is not reading the data into hypercard but rather figuring how Word stores its text. By committing the import code to a simple Hypertalk script, I could concentrate my efforts at decoding Word’s file format.

To simplify my search through the file format, I made the assumption that I could ignore any formatting information such as rulers, font and style changes. I was after was the text portion of the file only. This turns out to be a valid assumption since I wanted to import the file into a Hypercard field as text.

Finding the text was a snap with John Mitchell’s “FEDIT+”. I created a Word file using WORD and then examined it in FEDIT+. I noticed that the text always started at location 256 in the file. Since the size of the file was larger than the size of the text plus this 256 byte header, I needed to determine where the end of text occurred (assuming that the formatting and ruler information follows the text in the file). Since I knew how long the text was, I again used FEDIT+ to search the 256 header portion of the file. This time I was looking for any portion of the header that contained a count of the number of bytes in the text. Since I knew that my file contained exactly 100 characters (bytes), all I had to do was find this number somewhere in the header portion of the file. I found something close to what I was looking for at offset 16 in the file. This location corresponded to the number of characters in the text portion of the file plus 256 which was the length of the header.

The creators of Microsoft Word may be reading this and wondering why I’m assuming that the text size is a 16-bit entity rather than a 32 bit number. I’m not. Since Hypercard text fields are currently limited to 32K bytes, and since I knew none of my word files were longer than this, I’m only interested in the low-order word of the text length.

Reading the text portion of a Microsoft Word file into a hypercard container requires the following steps: (1) Position the mark at byte 16 of the file. Read the byte at this position and multiply it by 256 making it the high-order half of the file length. Read the next byte and add it to the hight-order half of the length. Move 238 more bytes into the file (16+2+238 = 256). This is the start of the text portion of the file. Read the number of bytes calculated minus 256. The IT container gets the imported text.

The Hypertalk script in listing 1 performs the above steps for importing up to 16K bytes of text from a Word file. I use Steisplay WORD files only in the GetFile dialog and to get the full pathname of the file from the user. This script reads in the text without any looping so an XCMD may not speed things up enough to be warranted.

{1}
on mouseup
  put filename(“WDBN”)into filename
  if filename is not empty then
     open file filename -- filename is the full pathname of a WORD file
     read from file filename for 16 -- move file mark to the text length 
word
     read from file filename for 1  -- read the upper half of the length
     put chartonum( it ) * 256 into filesize -- shift up by 8 bits
     read from file filename for 1  -- get the lower half of the length
     add chartonum( it ) mod 256 to filesize
     read from file filename for 238-- move to start of text in the file
     read from file filename for filesize-256 -- read in the text
     close file filename -- IT now contains the imported data.    
  end if
end mouseup

Listing 1. Script to Import Text from a Microsoft Word File

Not all file formats can be imported quite so simply. Macwrite uses a packed text format, storing one or two characters per byte using a simple compression scheme.

Because the text is compressed, we can’t just read the file into a container and return the result to Hypercard. We must first decompress the file a byte at a time. Such a process suggests looping and loops, as we know, are not particularly fast in HyperTalk. Although the decompression can be performed in a hypertalk script, we can write an xcmd that performs the decompression faster.

The key to reading in a MacWrite file is understanding that Macwrite stores its data by paragraph. Whereas Word files are clearly divided between the text and formatting information, Macwrite stores formatting information for each paragraph at the end of the text for that paragraph. Hypercard doesn’t do formatted text; we want to ignore the formatting information at the end of each paragraph. Our algorithm then becomes a loop that reads in a paragraph at a time, decompresses the text for that paragraph ignoring the formatting information. This process is repeated for each paragraph in the file.

One small “gotcha” to this approach stems from the fact that Rulers and pictures are also considered paragraphs. When we encounter either of these objects, we just move on to the next paragraph.

Listing 2 depicts the code for this XFCN. I chose “C” because pointer arithmetic is easier to perform in “C” and because last month’s example was written in Pascal. I made every attempt to keep the “C” isomorphic to a Pascal program so that you can easily convert the code to Pascal.

Finding the paragraph information in the file requires a little arithmetic. Bytes 2-3 in the file tell us how many paragraphs the main document contains (MacWrite makes a distinction between the main document, the header document and the footer document. For our purposes, we only want to read in the main body of text) If bytes 2-3 contain a 5 then there are 5 paragraphs in the main document.

For each paragraph, MacWrite stores an information array. We start reading the information arrays at the file position pointed to in file offset $108. An information array is an array of 16-byte elements that tell us something about each paragraph. The first two bytes in the information array tell us whether the paragraph contains text, a ruler or a picture. If this value is positive the paragraph contains text, if this value is 0 or negative the paragraph is a ruler or a picture respectively and we can ignore it.

Offset 8 in the information array contains a status byte that provides some information about the text. If bit 3 is set, the text in this paragraph is compressed. Bytes 9-11 tell us the absolute file offset for the start of the data in the paragraph and bytes 12-13 contain the length of the data (paragraph addressing is 24 bits and each paragraph contains up to 64K of characters or data). The trick is to read in the number of characters indicated in the information array, determine if the paragraph contains text and, if so, decompress the text if it’s compressed.

Once we read in the paragraph, we get some more information. The first two bytes of the paragraph tell us how many characters of text will appear in the decompressed paragraph. Following the text on an even word boundary is the formatting information for the paragraph which we ignore in this example.

MacWrite’s text compression is based on a letter frequency scheme stored as STR resource #700 in MacWrite’s resource fork. For English, this string contains “ etnroaisdlhcfp”. Macwrite maps these characters onto the array [$0..$F]. The space character ($20) has a value of 0, letter “e” has a value of 1, “t” a value of 2 and so on. Since any number less than $F can fit into a nibble, the word “eels” can be represented as “$11A8” rather than the byte-wide representation of $65656C73. In this example, we realize a 50% space saving (the best case for this algorithm).

This compression scheme only works for lower-case letters since 4 bits is not enough information to code for word frequency and case for the 14 most popular letters. This scheme also doesn’t compress non-alphabetic characters such as numerals and punctuation marks. In these cases, the 16th array element, $F, is used as a flag to tell indicate that the next 2 nibbles represent one character. “Then” would be coded as $F55906. Note that the letter “T” crosses byte boundaries, the top nibble is in byte 0 and the lower nibble is in byte 1. This is of no consequence to the algorithm.

Armed with this information, you should have little trouble understanding the XFCN. In fact, I hope you find it useful and informative! (Next month: printing from XCMDs).

{2}
/*************************\
*file:  MWRead.c *
*  *
* an XFCN that imports text *
* directly from a MacWrite file  *
* whose full pathname is passed  *
* as an input parameter.  *
**
* -------------------------------- *
* To Build this file:*
**
*C -q2 -g MWRead.c *
**
*link -sn Main=MWRead    *
*-sn STDIO=MWRead *
*-sn INTENV=MWRead  *
*-rt XFCN=301   *
*-m MWREAD MWRead.c.o    *
*“{CLibraries}”CInterface.o  *
*-o “your stack name”*
**
* -------------------------------- *
* By: Donald Koscheka*
* Date: 2-July-1988*
* ©1988, Donald Koscheka  *
*All Rights Reserved *
**
* -------------------------------- *
\*************************/

#include<Types.h>
#include<OSUtils.h>
#include<Memory.h>
#include<Files.h>
#include<Resources.h>
#include  “HyperXCmd.h”

#define INFOPOS  0x00000108
#define PPOS0x00000002
#define COMP0x0008


/* -------------------------- */
/* Define the structure of an */
/* information array element*/
/* */
/* pHite is positive if this*/
/* info array points to text, */
/* ignored otherwise.*/
/* */
/* fPos is the absolute file*/
/* position of the start of */
/* the paragraph in the file*/
/* */
/* fLen is the total length of*/
/* the file including formats */
/* -------------------------- */

typedef struct infoArr {
 short  pHite;/* parag hite */
 short  pixels;/* ignore this */ 
 long pHand;/* ignore this*/
 char status;/* chk comprsn */
 char hiMark;/* msw of mark */
 short  loMark;/* lsw of mark */
 short  fLen;  /* parag. len*/
 short  fmat;  /* ignore this */ 
}infoArr;

/* ------------------------ */

short   ReadFile();
Handle  DeCompress();


pascal void MWRead( paramPtr )
 XCmdBlockPtr  paramPtr;
/*************************
* In:ParamPtr:
*pointer to XCMD param
*block. params[0] is the
*name of the macwrite file
*  to open.
*
* Out:ParamPtr->returnValue
*empty if data could not
*be read, text portion 
*of a   Macwrite document.
*************************/
{
short 
 ref,   /* file reference */
 err,   /* io error  */
 vRef,  /* vol reference  */
 pcnt,  /* #  paragraphs  */
 tSiz,  /* text length  */
 loop;  /* loop counter */
long  
 fSiz,  /* data size */
 iSiz,  /* out data size  */
 iMark,/* iarr file pos */
 fPos;  /* para. offset */
Handle
 ImportText,
 decomp,/* decompressed*/
 temp;
infoArr
 info;
char 
 *fName,
 vName[32];

ImportText = nil;

if( paramPtr->params[0] != nil ){
HLock( paramPtr->params[0] );

GetVol( vName, &vRef );
fName = *(paramPtr->params[0]);
err = FSOpen( fName, vRef, &ref );
HUnlock( paramPtr->params[0] );

if( err == noErr ){
 ImportText = NewHandle( 0 );
 
 /* get paragraph count   */
 fSiz = sizeof( short );
 err=ReadFile(ref,fSiz,&pcnt,(long)PPOS);
 
 /* get infoArray position*/
 fSiz = sizeof( long );
 err=ReadFile(ref,fSiz,&iMark,(long)INFOPOS);
 
 /* read in the paragraphs*/
 for( loop = 0; loop < pcnt; loop++){
 fSiz = sizeof( infoArr );
 err=ReadFile(ref,fSiz,&info,iMark);
 
 if( info.pHite > 0 ){
 /* paragraph is text*/
 
 /* calc text position  */
 fPos=(info.hiMark<<0x10)+info.loMark;
 
 /* get the char count  */
 fSiz = sizeof( short );
 err = ReadFile(ref,fSiz,&tSiz,fPos);
 
 /* read in the text */
 temp = NewHandle((long)tSiz);
 HLock( temp );
 fPos += 2; 
 fSiz = (long)tSiz;
 err = ReadFile(ref,fSiz,*temp,fPos);
 
 if( info.status & COMP ){
 /* paragraph is compressed */
 HLock( temp );
 
 decomp = DeCompress( *temp, tSiz );
 
 HUnlock( temp );
 DisposHandle( temp );
 
 temp = decomp;
 tSiz = (short)GetHandleSize( decomp ); 
 }/* if( info.status & COMP ) */
 
 iSiz = GetHandleSize( ImportText );
 fSiz = (long)tSiz;
 SetHandleSize( ImportText, iSiz+fSiz );
 BlockMove(*temp,(*ImportText)+iSiz,fSiz);
 
 HUnlock( temp );
 DisposHandle( temp );
 }/* if( info.pHite > 0 ) */
 
 iMark = iMark + sizeof( infoArr );
 
 }/* FOR paragraph count  */

 iSiz = GetHandleSize( ImportText );
 SetHandleSize( ImportText, iSiz+1 );
 *((*ImportText)+iSiz) = ‘\0’;
 
 FSClose( ref );
 FlushVol( nil, vRef );
}/* if file opened ok     */

paramPtr->returnValue = ImportText;
}

}

short   ReadFile(ioRef,siz,buf,from)
short   ioRef;
long    siz;
char    *buf;
long    from;
/*************************
* read cnt bytes from the file specified by parms and put 
* the data into the buffer pointed to by buf 
*
* ioRef = file reference number
* siz = number of bytes to read
* buf = where to read in to
* from  = where in file to read from
*
* from is the file mark relative to the start of the file from
* which the read is to start.
*************************/
{
 short  err;
 
 err = SetFPos( ioRef, fsFromStart, from );
 if( err == noErr )
 err = FSRead( ioRef, &siz, buf ); 
 return( err);
}

Handle DeCompress( inp, expcnt )
 char *inp;
 short  expcnt;
/***********************
* Decompress the input handle’s data (inH) and put the result
* in the output Handle (outH).  outH is sized properly and 
* we use the following scheme:
*
*0 1 2 3 4 5 6 7 8 9 A B C D E F
*_ e t n r o a i s d l h c f p !
*
* where _ = SPACE
*! = not compressed
* 
* Consult MacWrite resource str #700 for the decompression
* string in your file (different for other languages).
* cycle through until the decompressed string count 
* matches the expected count
***********************/
{
 short  chcnt;
 register char *op;
 register char hiNib;
 register char loNib;
 char   dc[16];
 Handle outH;
 
 outH = NewHandle( (long)expcnt );

 dc[0]  = 0x020;
 dc[1]  = ‘e’;
 dc[2]  = ‘t’;
 dc[3]  = ‘n’;
 dc[4]  = ‘r’;
 dc[5]  = ‘o’;
 dc[6]  = ‘a’;
 dc[7]  = ‘i’;
 dc[8]  = ‘s’;
 dc[9]  = ‘d’;
 dc[10] = ‘l’;
 dc[11] = ‘h’;
 dc[12] = ‘c’;
 dc[13] = ‘f’;
 dc[14] = ‘p’;
 
 HLock( outH );
 op = *outH;
 chcnt = 0;
 
 while( chcnt < expcnt ){
 hiNib = loNib = *inp++;
 hiNib = hiNib >> 0x04;
 hiNib &=  0x000F;
 loNib &=  0x000F;
 
 if( hiNib < 0x0F ){
 *op++ = dc[hiNib];
 chcnt++;
 if( loNib < 0x0F ){
 *op++ = dc[loNib];
 chcnt++;
 }
 else{ /* next BYTE is a char */
 *op++ = *inp++;
 chcnt++;
 }
 }
 else{
 /* next 2 nibbles represent*/
 /* a complete char which */
 /* is on odd-nibble bounds */
 
 *op  = loNib << 0x04;
 hiNib  = *inp++;
 loNib = hiNib & 0x000F;
 hiNib = hiNib >> 0x04;
 hiNib &= 0x0F;
 *op  = *op | hiNib;
 op++;
 chcnt++;
 
 if( loNib < 0x0F )
 *op++ = dc[loNib];
 else
 *op++ = *inp++;
 chcnt++;
 }
 }
 HUnlock( outH );
 return( outH );
}

#include <XCmdGlue.inc.c>

Listing 2. XFCN to import trxt from a MacWrite Document

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Dropbox 193.4.5594 - Cloud backup and sy...
Dropbox is a file hosting service that provides cloud storage, file synchronization, personal cloud, and client software. It is a modern workspace that allows you to get to all of your files, manage... Read more
Google Chrome 122.0.6261.57 - Modern and...
Google Chrome is a Web browser by Google, created to be a modern platform for Web pages and applications. It utilizes very fast loading of Web pages and has a V8 engine, which is a custom built... Read more
Skype 8.113.0.210 - Voice-over-internet...
Skype is a telecommunications app that provides HD video calls, instant messaging, calling to any phone number or landline, and Skype for Business for productive cooperation on the projects. This... Read more
Tor Browser 13.0.10 - Anonymize Web brow...
Using Tor Browser you can protect yourself against tracking, surveillance, and censorship. Tor was originally designed, implemented, and deployed as a third-generation onion-routing project of the U.... Read more
Deeper 3.0.4 - Enable hidden features in...
Deeper is a personalization utility for macOS which allows you to enable and disable the hidden functions of the Finder, Dock, QuickTime, Safari, iTunes, login window, Spotlight, and many of Apple's... Read more
OnyX 4.5.5 - Maintenance and optimizatio...
OnyX is a multifunction utility that you can use to verify the startup disk and the structure of its system files, to run miscellaneous maintenance and cleaning tasks, to configure parameters in the... Read more
Hopper Disassembler 5.14.1 - Binary disa...
Hopper Disassembler is a binary disassembler, decompiler, and debugger for 32- and 64-bit executables. It will let you disassemble any binary you want, and provide you all the information about its... Read more
WhatsApp 24.3.78 - Desktop client for Wh...
WhatsApp is the desktop client for WhatsApp Messenger, a cross-platform mobile messaging app which allows you to exchange messages without having to pay for SMS. WhatsApp Messenger is available for... Read more
War Thunder 2.33.0.135 - Multiplayer war...
In War Thunder, aircraft, attack helicopters, ground forces and naval ships collaborate in realistic competitive battles. You can choose from over 1,500 vehicles and an extensive variety of combat... Read more
Iridient Developer 4.2 - Powerful image-...
Iridient Developer (was RAW Developer) is a powerful image-conversion application designed specifically for OS X. Iridient Developer gives advanced photographers total control over every aspect of... Read more

Latest Forum Discussions

See All

A Legitimate Massage Parlor, I Swear – T...
In this week’s Episode of The TouchArcade Show we talk about some of the major new releases on mobile this week including Warframe, we go over all the major news that came out from the Nintendo Direct Partner Showcase on Wednesday, we read our one... | Read more »
TouchArcade Game of the Week: ‘Rainbow S...
I feel like I am in a fever dream right now. What is this game that I’m playing? It’s a Rainbow Six game? But it’s all cutesy, and cartoony, and also kind of psychedelic? How is this a real thing? Well, it’s no fever dream, it is indeed a real thing... | Read more »
SwitchArcade Round-Up: ‘Promenade’, ‘Cho...
Hello gentle readers, and welcome to the SwitchArcade Round-Up for February 23rd, 2024. It’s Friday, so we have to check out the remaining releases of the week. Not so many big ones today, but a healthy crop nonetheless. After summarizing all the... | Read more »
Steam Deck Weekly: Gundam Breaker 4 and...
Welcome to this week’s slightly shorter edition of the Steam Deck Weekly. I was a bit unwell this week so no reviews in this edition, but there is a lot of news and new Steam Deck Verified and Playable games to catch up on. I have something special... | Read more »
The 10 Best Run-And-Gun Games for Ninten...
The year 2024 is a rare one, because it is a year with a brand-new Contra game. Contra: Operation Galuga might be the freshest face on the block when it comes to Nintendo Switch run-and-gun action games, but it’s hardly fighting that war alone.... | Read more »
Version 1.4 of Reverse: 1999 will be lan...
Free up your diary for February 29th, as Bluepoch has announced the impending release of the award-winning Reverse: 1999s Version 1.4 update. The Prisoner in the Cave is an Ancient Greece-themed update with new recruits, gameplay modes, and plenty... | Read more »
Premium Mobile RPG ‘Ex Astris’ From ‘Ark...
Arknights developer Hypergryph’s premium RPG Ex Astris () recently had its release date confirmed, and we finally have an extended gameplay showcase. | Read more »
Apple Arcade Weekly Round-Up: Updates fo...
Following yesterday’s big Hello Kitty Island Adventure update, a few more notable game updates and events have gone live on Apple Arcade. Cypher 007 () has gotten its first content update in a few months taking you to Egypt for five new levels... | Read more »
‘Thunder Ray’ and ‘Hime’s Quest’ Are Now...
Crunchyroll has pushed two new games into the Crunchyroll Game Vault including Purple Tree Studio’s Thunder Ray which was already on iOS before as a premium release. Shaun even reviewed it last year. Read his review here. The second game is Poppy... | Read more »
Adorable Kitty-Collector Sequel ‘Neko At...
Ya’ll. This October will mark the ten-year anniversary of Hit Point launching Neko Atsume, the adorable kitty collecting sim that has become a runaway success and essentially created a sub-genre of cozy pet-collecting life sim games. Sure, the... | Read more »

Price Scanner via MacPrices.net

16-inch M3 Max MacBook Pro on sale for $300 o...
Amazon is offering a $300 instant discount on one of Apple’s 16″ M3 Max MacBook Pros today. Shipping is free: – 16″ M3 Max MacBook Pros (36GB/1TB/Space Black): $3199, $300 off MSRP Their price is the... Read more
Apple M2 Mac minis on sale for $100 off MSRP...
B&H Photo has Apple’s M2-powered Mac minis in stock and on sale for $100 off MSRP this weekend with prices available starting at $499. Free 1-2 day shipping is available to most US addresses: –... Read more
Apple Watch SE on sale for $50 off MSRP this...
Best Buy has all Apple Watch SE models on sale this weekend for $50 off MSRP on their online store. Sale prices available for online orders only, in-store prices may vary. Order online, and choose... Read more
Deal Alert! Apple 15-inch M2 MacBook Airs on...
Looking for the lowest sale price on a new 15″ M2 MacBook Air? Best Buy has Apple 15″ MacBook Airs with M2 CPUs in stock and on sale today for $300 off MSRP on their online store. Prices valid for... Read more
Amazon discounts iPad mini 6 models up to $12...
Amazon is offering Apple’s 8.3″ WiFi iPad minis for $100-$120 off MSRP, including free shipping, for a limited time. Prices start at $399. Amazon’s prices are among the lowest currently available for... Read more
Apple AirPods Pro with USB-C discounted to $1...
Walmart has Apple’s 2023 AirPods Pro with USB-C in stock and on sale for $199.99 on their online store. Their price is $50 off MSRP, and it’s currently one the lowest prices available for new AirPods... Read more
Apple has 14-inch M3 MacBook Pro with 16GB of...
Apple has 14″ M3 MacBook Pros with 16GB of RAM, Certified Refurbished, available for $270-$300 off MSRP. Each model features a new outer case, shipping is free, and an Apple 1-year warranty is... Read more
Save $318-$432 on 16-inch M3 Max MacBook Pros...
Apple retailer Expercom has 16″ M3 Max MacBook Pros on sale for $318-$432 off MSRP when bundled with a 3-year AppleCare+ Protection Plan. Discounts are available for Silver models as well a Space... Read more
New today at Apple: 16-inch M3 Pro/M3 Max Mac...
Apple is now offering 16″ M3 Pro and M3 Max MacBook Pros, Certified Refurbished, starting at $2119 and ranging up to $530 off MSRP. Each model features a new outer case, shipping is free, and an... Read more
Apple is now offering $300-$480 discounts on...
Apple is now offering 14″ M3 Pro and M3 Max MacBook Pros, Certified Refurbished, starting at $1699 and ranging up to $480 off MSRP. Each model features a new outer case, shipping is free, and an... Read more

Jobs Board

Part-time *Apple* and Peach Research Assist...
…and minimum qualifications: + Assist with planting, pruning, and harvesting of apple and peach trees + Conduct regular maintenance tasks to ensure optimal Read more
Housekeeper, *Apple* Valley Villa - Cassia...
Apple Valley Villa, part of a senior living community, is hiring entry-level Full-Time Housekeepers to join our team! We will train you for this position and offer a Read more
Sublease Associate Optometrist- *Apple* Val...
Sublease Associate Optometrist- Apple Valley, CA- Target Optical Date: Feb 22, 2024 Brand: Target Optical Location: Apple Valley, CA, US, 92307 **Requisition Read more
Sublease Associate Optometrist- *Apple* Val...
Sublease Associate Optometrist- Apple Valley, CA- Target Optical Date: Jan 24, 2024 Brand: Target Optical Location: Apple Valley, CA, US, 92307 **Requisition Read more
Housekeeper, *Apple* Valley Village - Cassi...
Apple Valley Village Health Care Center, a senior care campus, is hiring a Part-Time Housekeeper to join our team! We will train you for this position! In this role, Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.