Support Multibyte Text

Volume Number: 14 (1998)
Issue Number: 1
Column Tag: Toolbox Techniques

Supporting Multi-byte Text in Your Application

by Nat McCully, Senior Software Engineer, Claris Corporation

How to make most efficient use of the Script Manager and International APIs in your text engine

Introduction

You have developed the next greatest widget or application and you want to distribute it on the Net. Your application has a text engine, maybe simple TextEdit, or one of the more sophisticated engines available for license, or even one you wrote yourself. One day, you get an e-mail from someone in Japan:

Dear Mr. McCully,
My name is Takeshi Yamamoto and I use your program, MagicBook, everyday. But, I have a problem using Japanese characters in it. When I hit the delete key, I get weird garbage characters. My friends and I wish to use both Japanese and English in your program, but it does not work properly. Please fix it!
T. Yamamoto

Suddenly, there are people on the other side of the world who want to use your application in their language, and you are faced with a dilemma. You have no first-hand knowledge of the language itself, but you may be somewhat familiar with the Macintosh's ability to handle multiple languages in a single document with ease. Simply cracking open Inside Macintosh: Text seems daunting. How will these new routines affect the performance of your program? Will you introduce unwanted instability and anger your existing user base? Where can you find information on how to use which routines best, not just a description of what each routine does?

This article attempts to address some of these issues, and in general familiarize you with some of the best things that make the Mac an excellent international computing environment. Intelligent use of the Macintosh's international routines, WorldScript, and the other managers in the Toolbox can be the difference between a US-only application and a truly "world-ready" tool that any user, anywhere, can utilize as soon as they download it to their hard disk. Although this paper deals primarily with Japanese language issues, the concepts outlined herein can be used with any multi-lingual environment.

What is WorldScript?

WorldScript is the set of patches to the system that enables the correct display and measurement of multi-lingual text. Over time, many of these patches have been rolled in to the base system software, but even in MacOS 7.6.1, you will find a set of WorldScript extensions in the Extensions folder when you install one of the Language Kits available from Apple. The concepts and code snippets in this paper will work equally well on, for example, the Japanese localized Mac OS, or on a standard U.S. system with the Japanese Language Kit (JLK). A good source of localized system software and Language Kits is the Apple Developer Mailing CD-ROM, available from the Apple Developer Catalog. WorldScript is one of the Apple-only technologies that makes multi-lingual computing possible in a far easier way than the other guys. When it comes to having Chinese, Korean and Japanese all in the same document, WorldScript, on Mac OS, is the only thing out there.

Multi-byte Text on the Mac

OK, let's get to the meat, you say. How do you make your text engine handle two-byte characters? Well, before giving you a bunch of code, let's explain how the Mac handles two-byte text.

What is a Script?

Each language that the Mac supports is grouped into categories called "scripts." For example, English and the other Roman letter-based languages like French and German all belong to the Roman script. Japanese belongs to the Japanese script. Character glyphs in the Roman script are each represented by a single-byte character code. Japanese characters are represented by a 16-bit (2 byte) character code.

Setting Up the Port: Pre-System 7 API's versus New API's

On the Mac, each font is also associated with a script. There are Roman script fonts like Helvetica and Palatino, and Japanese script fonts like Osaka and Heisei Minchou. As you know, when text is drawn into a QuickDraw grafPort, you first set up the port with the appropriate font, size and style, and then call DrawText() to draw the text. If you are using the old Script Manager API's (like CharByte() and GetEnvirons()), you need to set the port to a font in the script you are interested in processing. Once the port is set to a particular font, calls to the Script Manager will follow the rules of that font's script. So the port, by way of setting the font, also has an implicit script setting. This is the key to using the Script Manager routines so they will return the correct information to your application using the older API's. The new API's have a script parameter, so it is not necessary to set the font of the port before using them. Since the Script Manager doesn't have to call FontScript() to find out the script of the current font before passing the script to WorldScript, using the newer API's could speed up your application is certain cases.

Adding Script-savvy Features to your Application

First, you need to determine if the user's system has a non-Roman script system installed. Many programs boost performance by calling Script Manager routines only when absolutely necessary. One way to find out all the scripts installed is to loop through all 33 possible script codes (smRoman being 0 and smUninterp being 32) and calling GetScriptManagerVariable() with the selector smEnabled. Roman script is always enabled.

Listing 1: Finding Which Scripts are Installed

ScriptCode   script;
for (script = smRoman + 1; script <= smUninterp; script++)
{
   if (GetScriptManagerVariable(script, smEnabled))
      InitInternalScriptInfo(script);
            // non-Roman script present...
}

InitInternalScriptInfo() refers to a routine you write that will initialize your internal data structures that deal with specific script systems, such as line-breaking tables, on a script-by-script basis.

Line Breaking

Most applications don't rely entirely on the Script Manager for line breaking, hit testing, or word selection, because using those routines is thought to be too slow. It is possible to optimize your text engine so that you incorporate the correct behaviors for each script system present, while maintaining the highest possible performance. The Toolbox call for finding line breaks is StyledLineBreak(). To use it, however, you must restrict the text you pass it to lengths of less than 32K (actually, this is true of the whole Script Manager, so tough) and text widths to whole pixel values (can you say 'rounding error?'), and if you are explicitly scaling the text, it won't work at all. You must also organize the text you pass to it in terms of script runs and style runs within them. Therefore, most applications that have word-processing functionality choose to implement their own line-breaking code that is customized for their own needs. Unfortunately, many of these private implementations break when used on WorldScript systems.

Line Breaking with Japanese Text

The simplest line-breaking algorithm for English text is to look for a space (ASCII 0x20) character in the line near the graphic break, and if there is none, to break on the byte-boundary nearest the graphic break. Japanese text is a bit more complicated. Japanese text has no spaces, so you must break at the character boundary nearest the graphic break. There is an additional wrinkle: Certain characters are not allowed to begin a line, and certain characters are not allowed to end a line. This set of line-breaking rules is referred to as Kinsoku shori. For example, you cannot begin a line with a two-byte period. You cannot end a line with a two-byte open parenthesis followed by more text on the next line. A list of kinsoku characters is available from the Japanese Standards Association in the form of a Japanese Industrial Standard (JIS) document. It is also in Ken Lunde's excellent book, Understanding Japanese Information Processing, in the section entitled "Japanese Hyphenation." While not all Japanese agree on the correct set of kinsoku characters, this set is a good default. Some applications allow the user to edit the kinsoku character set to their own liking.

Once you know that the current byte offset in your text is on or just before the graphic break, you need to see if that byte is part of a two-byte character. Then you need to see if the character is a character that can't end a line. Then you need to check the character after it to see if it is a character that can't begin a line. This can be repeated as necessary, for support of a string of kinsoku characters. For example, suppose the character on the graphic break (the break char) can end a line, but the character after it can't begin a line, causing the break char to wrap. However, the character before the break char is one that can't end a line, so you must then check the char before it, and so on, and so on, and... The example below is simplified to illustrate a particular case; actual code for an application would probably be organized differently.

Listing 2: Checking Graphic Break Char with Kinsoku Processing

CheckGraphicBreak
Check a text run's length against the pixel edge of the line
(the so-called graphic break), and then adjust the line break
if there is an illegal Kinsoku character on the break. 

UInt16 *   gStartLineKinsokuChars;   // chars that can't begin
                                     // a line.
UInt8      gNumStartKinsokuChars;    // number of chars above.
UInt16 *   gEndLineKinsokuChars;     // chars that can't end a line
UInt8      gNumEndKinsokuChars;      // number of chars above.

// This function will return FALSE if the char at offset
// is not a valid break point. It checks the char after it,
// but not the char before it, for kinsoku.
static Boolean CheckGraphicBreak(UInt8 *      textPtr,
                                              Uint16      offset,
                                              ScriptCode script);
{
   SInt16 result;

   // The textPtr starts at a known 'good' character
   // boundary. In this case it is the beginning of the
   // line, but it could be the beginning of the
   // stylerun.

   // Find out if script only has 1-byte chars. If so,
   // we assume it's ok to break at this char.
   if (GetScriptVariable(script, smScriptFlags) &
               (1 << smsfSingByte))
      return TRUE;

   result = CharacterByteType((Ptr)textPtr,
                  offset,
                  script);
   if (result == smSingleByte)
      return TRUE;   // In real life, you're not done
                        // until you check the chars before
                        // and after this one for kinsoku.
   if (result == smFirstByte)
      return FALSE;
   if (result == smLastByte)
   {
      UInt8      index;
      UInt16   theChar = *(UInt16 *)&textPtr[offset - 1];

      // Now we have a valid break on a 2-byte char.
      // We need to check if it's a kinsoku character.
      // This code checks Japanese kinsoku only, but
      // with a little work this could be extended to
      // all 2-byte scripts that don't break on spaces.
      if (script != smJapanese)
         return TRUE;
      
      for (index = 0; index < gNumEndKinsokuChars; index++)
      {
         if (theChar == gNumEndKinsokuChars[index])
            return FALSE;
      }
      
      // Now we check the char after this one, in case it
      // is a char that can't start a line. First see if
      // it's a 1-byte char. In real life, there are 1-byte
      // kinsoku chars to check for.
      if (textPtr[offset + 1] == NULL || 
         CharacterByteType((Ptr)&textPtr[offset + 1],
                  0, script) == smSingleByte)
         return TRUE;

      theChar = *(UInt16 *)&textPtr[offset + 1];
      
      for (index = 0; index < gNumStartKinsokuChars;
                        index++)
      {
         if (theChar == gNumStartKinsokuChars[index])
            return FALSE;
      }
   }
   return TRUE;
}

Hit Testing

Hit testing is another area in your text engine that demands the highest possible performance. When the user clicks in the text, any delay in setting the insertion point there will be noticed. Drag selection is another example of the same code working hard to find the character boundaries and setting the correct hilite area.

Some applications use a locally allocated cache of possible first byte character codes that they use to test a particular character in the text stream for byteness. This is simple to create with the Toolbox call FillParseTable(). FillParseTable() returns in your pre-allocated 256 byte buffer all the bytes that can be a first byte of a two-byte character in the script you pass to it. Be aware that in some scripts, some character codes can be both the first byte of a two-byte character as well as the second byte of a two-byte character, depending on their context within the text stream. Therefore, you need more than just this information to successfully find out what kind of character the byte you're interested in is a part of. In a mixed stream of text with both one-byte and two-byte characters, using the parse table in a single pass over the text is much faster than calling CharacterByteType() for each byte. An example of this is below, in a sample function that goes through a text stream and counts the number of characters in it:

Listing 3: Counting the Chars in Mixed-byte Text

CountCharsInScriptRun
Counts the number of characters in a run of contiguous script
(font), checking for both one-byte and two-byte characters.

UInt32 CountCharsInScriptRun(UInt8 * textPtr, UInt32 length,
               ScriptCode script)
{
   UInt8      parseTable[256];
   UInt32   curByte, charCount;
   

   (void)FillParseTable(&parseTable, script);
   
   for (curByte = 0L, charCount = 0L; curByte < length;
                        curByte++)
   {
      if (parseTable[textPtr[curByte]] == 1)
         continue;
      charCount++;
   }
   return (charCount);
}

Notice that because we started at a known 'good' boundary, we were able to test only the first bytes of the two-byte characters in the stream as we counted along. This code would not work in all cases if we started at an arbitrary point in unknown text, because of the ambiguity of the byteness of some character codes in some scripts. Caching the parse tables for all installed scripts in the user's system at launch time would further speed up your processing, so you wouldn't have to call FillParseTable() every time.

Measuring Two-byte Characters

On the Mac, all two-byte characters are the same width. In a future system software release, proportional two-byte characters will be supported, but up until now all two-byte-savvy applications assume mono-spaced two-byte characters, and even if proportional characters are supported, they will be mono-spaced by default so as not to break every application currently shipping.

Before the Mac OS supported measuring two-byte characters with TextWidth(), a special code point in the single-byte 256 char width table was reserved for the two-byte character width for that font. In the Japanese and both Chinese scripts, this code point is 0x83. In Korean script, it is 0x85. This code point still works, even though Apple now recommends you use TextWidth() for all measuring of multi-byte or mixed text. In the future for proportional measuring, TextWidth() will probably be what you will use.

Below is an example of a function that measures any text, and returns the amount in a Fixed variable. This is useful if you are measuring text and the user has Fractional Glyph Widths turned on (meaning you made a call to SetFractEnable()).

Listing 4: Measuring Mixed-byte Text

GetTextWidth
This function supports multiple styleruns in the text, and consists
of two loops: One for the styles and one for the characters in each
style. The port is set on each stylerun and then Macintosh APIs
are called to measure the text. 

#define JSCTCWidthChar   0x83
#define KWidthChar   0x85

typedef struct tagStyleRun {
   UInt32     styleStart;
   UInt16     font;
   UInt16     size;
   UInt8      face;
} StyleRun,   *StyleRunPtr;

Fixed GetTextWidth(UInt8 * textPtr, UInt32 length,
                           StyleRunPtr styleRuns, UInt32 numStyles)
{
   ScriptCode      curScript;
   UInt32          byteNum, styleNum;
   Fixed           totalWidth = 0L;
   ScriptCode      curScript;
   FMetricRec      curFontMetrics;
   WidthTable **   curWidthTable;
   UInt8           parseTable[256];
   
   // loop thru each stylerun, measure its characters
   for (styleNum = 0L; styleNum < numStyles; styleNum++)
   {
      // Set up the port (in real life, you'd restore the
      // old settings when you exit) 
      TextFont(styleRuns[styleNum].font);
      TextFace(styleRuns[styleNum].face);
      TextSize(styleRuns[styleNum].size);
      FontMetrics(&curFontMetrics);
      curWidthTable = curFontMetrics.wTabHandle;
      HLock((Handle)curWidthTable);
      curScript = FontScript();
      (void)FillParseTable(&parseTable, curScript);
      
      // loop thru each char in the stylerun
      for (byteNum = styleRuns[styleNum].styleStart;
            (styleNum + 1 < numStyles &&
               byteNum < styleRuns[styleNum+1].styleStart) ||
            (styleNum + 1 >= numStyles && byteNum < length);
               byteNum++)
      {
         if (parseTable[textPtr[byteNum]] == 1)
         {
            if (curScript == smJapanese ||
                  curScript == smTradChinese ||
                  curScript == smSimpChinese)
               totWidth +=
                  (*curWidthTable->tabData)[JSCTCWidthChar];
            else if (curScript == smKorean)
               totWidth +=
                  (*curWidthTable->tabData)[KWidthChar];
            else
               totWidth += (Fixed)
                  TextWidth(&textPtr[byteNum], 0, 2) << 16;

            byteNum++;
         }
         else
            totWidth += 
               (*curWidthTable->tabData)[textPtr[byteNum]];
      }
      HUnlock((Handle)curWidthTable);
   }

   return (totWidth);
}

The above function still makes expensive calls like FontMetrics(), FillParseTable() and TextWidth() on each stylerun. It would be an even better idea to have a local cache of the width tables and parse tables of fonts you know are in the document, so you don't have to rebuild them every time the user clicks or drags or types in the text.

So, now that you have a relatively fast way of measuring the text, you can use it to find the pixel value of any character in the text, and use that for your internal CharToPixel and PixelToChar logic. Or, you can use the Mac OS Toolbox calls CharToPixel() and PixelToChar(), which will always work on any script but may be slower.

Localizing Your Application for Japan

Now that we have reviewed a few of the basic text engine issues for handling two-byte text, there are a few things about Japan in particular that make localization a challenge.

Japan is possibly the most interesting major software market to localize for, if you are interested in text and text layout. It is a mature market, with a diverse number of products enjoying many millions of dollars in sales each year. The Macintosh has a larger market share there than in the U.S. or Europe. Text in Japan has traditionally been difficult to input and output using machines, and the use of text in graphic design requires that the text layout be extremely flexible. The characters are complex (so complex that bolding them may make them illegible), and emphasis or adornment has forms that use background shading, different types of lines around the text, and even dots or ticks above or to one side of each character. Condensed and extended faces have different results on PostScript printers than they do on QuickDraw. Bold and italic faces were not supported on the first PostScript Japanese printers. Underlines are not drawn by QuickDraw when the font is a Japanese font. These last two things might be fixed in future releases of the system software, but for now the application developer must work around them.

For underline, you must draw a line under the text. The reason QuickDraw doesn't draw it for you is that it usually uses the font's baseline as the underline location, but Japanese fonts' two-byte glyphs take up more room and descend below the baseline. Where you draw your underline is up to you, but take a look at how other Japanese programs do it and make it fairly consistent. Vertical text is pretty much a checkbox item nowadays in Japanese word-processing programs. Most novels and many magazines are layed out vertically, but until recently computers were horizontal-only. While the Windows95® APIs support drawing text vertically, the Mac OS still does not, outside of using QuickDrawGX typography (which is excellent, by the way). In comparing vertical text to horizontal text, several things change about the line layout: The first line starts at the top right, and the text flows down to line-end, then wraps to the next line, which is to the left of the first line; the baseline is generally considered to be in the center of the line; underlines are drawn to the right of the text, as are emphasis dots; two-byte characters are not rotated, but single-byte characters are, 90° clockwise; certain characters have vertical text variants, like many punctuation characters. Where these variants are in the font can be found in the 'tate' table in the font ("tahteh" means "vertical" in Japanese).

Rubi are small annotation characters, placed above, below or to the side of the text they annotate. Usually they provide pronunciation guidance for unusual or hard-to-pronounce Kanji characters.

Date formats in Japan include the current year of the emperor's reign; again, supported on Windows95® but not on Mac OS. It is up to the application to support these formats if so desired. Also, date formats 2 and 3 produce identical results, due to the fact that the abbreviated month and the long month are the same thing in Japanese. Japanese applications may opt to substitute a different format in one of those formats' place.

Find and Replace needs to be expanded to include the different types of characters used in Japanese. Standard Japanese text may contain any of the following types of characters: one-byte Roman, two-byte Roman, one-byte numerals and symbols, two-byte numerals and symbols, one-byte katakana syllables, two-byte katakana syllables, two-byte hiragana syllables, and two-byte Kanji characters. The hiragana and katakana characters are equivalent in terms of the sounds they represent in Japanese, so a good Find/Replace function should include an option to find the search string in either syllabary.

Sorting in Japanese is difficult because the Kanji characters can have different pronunciations depending on their context. To sort Kanji correctly, you need a separate kana key field that indicates the pronunciation and you sort on that. Also, Mac OS CompareText() doesn't sort the long sound symbol correctly (that symbol changes sound depending on the character before it, but Mac OS always sorts it in the symbols area), so for linguistically correct sorting you need to write your own sorting routine.

If your application supports character tracking using the Color QuickDraw function CharExtra(), be aware that the CGrafPort member chExtra only uses 4 bits for signed integer values and the other 12 bits for the fraction. The value you pass to CharExtra() is a Fixed value of how many pixels you want to track out (or in) the text, and QuickDraw divides that by the current text size, to arrive at the chExtra value. This means that if the tracking value you pass to CharExtra() is greater than 8 times the text size, the chExtra field will go negative, and your text will be drawn incorrectly. Unfortunately, Japanese text is routinely tracked out beyond this limit in many applications. The only workaround is for you to draw the text one character at a time, and use the QuickDraw pen movement calls like MoveTo() to move the pen yourself. The same is true for SpaceExtra().

Inline Input

Inline input of Japanese, Chinese or Korean is a way of using an intermediate program (called an Input Method) to translate your keystrokes into the many thousands of possible characters in those languages, all in the same place on screen that you would normally see characters typed in the line. In Japanese, the Input Method changes your keystrokes into phonetic Japanese kana characters, then converts some of those characters into Kanji characters to form a mixed kana and Kanji sentence. Then the user hits the return key to confirm the text in the line, ending the inline input session. Inline input on the Mac on System 7.1 or later uses the Text Services Manager (TSM). If your application uses TextEdit as its main text engine, you can support inline input quite easily using TSMTE. If you have your own text engine, you will need to do more work to support TSM Inline Input.

TSM uses Apple events to send and receive data between your application and the Input Method. You must implement several AppleEvent handlers, the most complex of which is the kUpdateActiveInputArea. In that handler, you must draw the text in all its intermediate stages, as the user is composing and editing the Japanese sentence before s/he confirms it to the document. If there is text after the so-called 'inline hole,' you must actively reflow the text if such editing causes the length to change. Each time the user makes a change, the text in the inline hole is received from the Input Method in an Apple event. The application draws it in the text stream, along with special Inline styles that help the user tell which text in the inline hole is raw (unconverted) text, which is converted text, which is the active phrase, where the phrase boundaries are in the inline hole, and other information.

After implementing the TSM support in your application, it is imperative that you test it with third-party Input Methods. At the time TSM was introduced, the documentation for how to write an Input Method was still a little spotty. This resulted in each Input Method handling text slightly differently. Also, Kotoeri, Apple's Input Method, has fewer features than the leading third-party Input Methods. Be sure to test your application with all of them you can find, so you can verify that it won't crash or produce strange results. Some Input Methods have strange quirks, like always eating mouseDown events, or having different requirements about how large a buffer they can handle without crashing. This knowledge comes from testing, and sometimes can be found on the Internet in Usenet newsgroups (in Japanese).

What About Unicode?

Unicode is being billed as the latest panacea for the problems of internationalization. What does Unicode give you? Where does it fall short?

Unicode was designed to solve one problem: There are many incompatible, overlapping encoding schemes for different languages, and supporting all of these encodings is a complex problem. What if there was a single encoding scheme that supported all the writing systems of the world, and guaranteed that you could display text in all the languages Unicode supports if only you had the right Unicode font for each language? Unicode tries to be that encoding.

For Japanese text data, the Mac OS and Windows95 use Shift-JIS internally, while Rhapsody and WindowsNT® use Unicode. On the Internet, most Japanese text is encoded using the 7-bit ISO-2022-JP standard. Whether or not you use Unicode to represent text internally to your application, you will have to support all three standards for full file and data compatibility with the rest of the world. In Unicode, all characters are two bytes long. So, you no longer have to worry about testing for byteness in a Unicode stream. However, all ASCII characters are represented with a leading 0x00 in Unicode. So you can't have loops that look for a terminating NULL in a C-string. And, all your formerly one-byte text doubles in size unless you explicitly compress it (and then you lose the byteness testing advantage).

Whether or not you think testing byteness is too complex or expensive to do, you should know that Unicode also does another controversial thing: For the so-called "Han" languages (Japanese, Chinese, Korean) that use characters that originated in China, it attempts to unify them into one codepoint for each character judged by the Unicode Consortium to be unique, even if it has variant forms in each language. The same is true for Arabic languages (Persian & Farsi). Because of this, you cannot tell what language a character is in just from its codepoint. Unicode was not designed to be a multi-lingual solution, in that representations of Chinese and Japanese in the same document will have overlapping character codes, requiring the OS to provide a parallel linguistically-coded data structure to render the glyph forms appropriately to each language. This might be another version of today's font/script/language relationship on Mac OS. As you can imagine, the Chinese, Japanese and Korean governments have each published competing encoding standards to Unicode, labeling the latter as something designed by foreigners who didn't understand the issues (both political and linguistic) involved in trying to make a worldwide encoding system.

Another issue about Unicode is that although it can represent 65,536 characters, there is not enough space for all the Han characters and their variants, plus all the other languages that Unicode currently supports. New languages are becoming computerized as more countries join the Digital Revolution and the Unicode Consortium cannot give space to all of them. Preferring the flat encoding model, they came up with another standard that uses four bytes per character (the ISO 10646 encoding standard). Given that on the Internet, where many languages need simultaneous support on computers, bandwidth is at a premium, I would prefer using the ISO 2022 standard of mixed-byte (7-bit and 14-bit characters) plus the escape codes that tell you what language the current stream is in to sending 32-bit characters through the wire. Since most web pages use this encoding, expect your OS to provide utilities for encoding conversion (like the Mac OS Encoding Converter debuting soon on a Mac near you).

Cross-Platform Development Issues

Going cross-platform is already complicated without having to think about internationalization. Should you have separate codebases for maximum use of each platform's unique features? Or should you have a single codebase and use an emulation layer for the other OS's APIs? Each has its advantages, but for this article I can speak to those of you who have a joint codebase, and tell you about some of the things that the Windows platform lacks that you have to write yourself for multi-byte support and internationalization.

Windows has no Script Manager. There is no Gestalt Manager. It cannot support multiple two-byte codepages at the same time. It uses totally separate fonts for vertical and horizontal text. It supports proportional kana in Japanese, so you can't assume all two-byte characters are the same width.

If your code uses the Script Manager routines heavily, then you will have to write them yourself on the Windows side. All the convenience of the Mac OS's international routines comes very clear when you try the same things on a PC!

Also, Japan once again has its own special challenges. Until Windows came out in Japan, each computer manufacturer made its own proprietary OS and hardware. Even floppy disks were incompatible with each other. Now, most companies have adopted the Intel PC standard, but NEC continues to manufacture its own line of incompatible PCs. NEC has such a huge share of the market in Japan that it has teamed up with Microsoft to produce its own version of Windows95 for NEC. So when you buy Windows95 in Japan, you find there are three versions: MS Windows95 for Intel, MS Windows95 for NEC, and NEC Windows95 for NEC. All three versions are basically the same feature-for-feature, but the drivers are different and you need to test your application on each platform to verify compatibility.

On the hardware side, you will find that Japanese hardware is different: They use different displays, different keyboards, different printers, and different floppy formats. The drive lettering on NEC machines is different from Intel PCs: The hard disk drive is labeled 'A:' on one and 'C:' on the other. Make sure your installer isn't hard-coded to install on drive C:.

Conclusion

As we have seen, internationalization of your software on Mac OS is not very difficult to do, and it is to your benefit to try and enable as many users as possible to enter text in their own language when using your program. We have also examined Japanese localization in more depth, and demonstrated that Japanese language applications usually require some amount of new features designed specifically for that language's needs and conventions. As more markets around the world reach maturity, you can be sure that there will be ample opportunity to differentiate your product by adding locale-specific features. It is these locale-specific features that will tell your users that they are valued customers, and that their needs are being addressed in a very specific way. For your product, especially if you are in the initial designing phases, I would recommend you try to make it as easily expandable as possible. Design generic internationization into the core modules, while leaving open the opportunity to add locale-specific features for certain markets like Japan, as you see your product's market expand and rise in success.

Bibliography and Related Reading

Apple Computer, Inc. Inside Macintosh: Text, Menlo Park, CA: Addison Wesley, March 1993.
Apple Computer, Inc. "Technote OV 20, Internationalization Checklist," Cupertino, CA: Apple Computer, Inc, November 1993.
Griffith, Tague. "Gearing Up for Asia With the Text Services Manager and TSMTE," Develop Issue 29. Cupertino, CA: Apple Computer, Inc, March 1997.
Apple Computer, Inc. "Technote TE 531, Text Services Manager Q&As," Cupertino, CA: Apple Computer, Inc, May 1993.
Lunde, Ken. Understanding Japanese Information Processing, Sebastopol, CA: O'Reilly & Associates, September, 1993.

See also Ken Lunde's home page at http://jasper.ora.com/lunde/ for more information about multi-byte text processing on computers.

Nat McCully has been at Claris in the Japanese Development Group for the last 6 years. He has worked on numerous Japanese products, including MacWrite II-J, Filemaker Pro-J, Claris Impact-J, ClarisDraw-J, and ClarisWorks-J. He speaks, reads and writes Japanese, and enjoys traveling in Japan. He is currently working as Development Lead on the next release of ClarisWorks-J.

Software Updates via MacUpdate

Latest Forum Discussions

Combo Quest (Games)

Combo Quest 1.0 Device: iOS Universal Category: Games Price: $.99, Version: 1.0 (iTunes) Description: Combo Quest is an epic, time tap role-playing adventure. In this unique masterpiece, you are a knight on a heroic quest to retrieve... | Read more »

Hero Emblems (Games)

Hero Emblems 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: ** 25% OFF for a limited time to celebrate the release ** ** Note for iPhone 6 user: If it doesn't run fullscreen on your device... | Read more »

Puzzle Blitz (Games)

Puzzle Blitz 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Puzzle Blitz is a frantic puzzle solving race against the clock! Solve as many puzzles as you can, before time runs out! You have... | Read more »

Sky Patrol (Games)

Sky Patrol 1.0.1 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0.1 (iTunes) Description: 'Strategic Twist On The Classic Shooter Genre' - Indie Game Mag... | Read more »

The Princess Bride - The Official Game...

The Princess Bride - The Official Game 1.1 Device: iOS Universal Category: Games Price: $3.99, Version: 1.1 (iTunes) Description: An epic game based on the beloved classic movie? Inconceivable! Play the world of The Princess Bride... | Read more »

Frozen Synapse (Games)

Frozen Synapse 1.0 Device: iOS iPhone Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: Frozen Synapse is a multi-award-winning tactical game. (Full cross-play with desktop and tablet versions) 9/10 Edge 9/10 Eurogamer... | Read more »

Space Marshals (Games)

Space Marshals 1.0.1 Device: iOS Universal Category: Games Price: $4.99, Version: 1.0.1 (iTunes) Description: ### IMPORTANT ### Please note that iPhone 4 is not supported. Space Marshals is a Sci-fi Wild West adventure taking place... | Read more »

Battle Slimes (Games)

Battle Slimes 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: BATTLE SLIMES is a fun local multiplayer game. Control speedy & bouncy slime blobs as you compete with friends and family.... | Read more »

Spectrum - 3D Avenue (Games)

Spectrum - 3D Avenue 1.0 Device: iOS Universal Category: Games Price: $2.99, Version: 1.0 (iTunes) Description: "Spectrum is a pretty cool take on twitchy/reaction-based gameplay with enough complexity and style to stand out from the... | Read more »

Drop Wizard (Games)

Drop Wizard 1.0 Device: iOS Universal Category: Games Price: $1.99, Version: 1.0 (iTunes) Description: Bring back the joy of arcade games! Drop Wizard is an action arcade game where you play as Teo, a wizard on a quest to save his... | Read more »

Price Scanner via MacPrices.net

14-inch M4 Pro/M4 Max MacBook Pros on sale th...

Don’t pay full price! Get a new 14″ MacBook Pro with an M4 Pro or M4 Max CPU for up to $320 off Apple’s MSRP this weekend at these retailers…they are the lowest prices available for these MacBook... Read more

Get a 15-inch M4 MacBook Air for $150 off App...

A couple of Apple retailers are offering $150 discounts on new 15″ M4 MacBook Airs this weekend. Prices at these retailers start at $1049: (1): Amazon has new 15″ M4 MacBook Airs on sale for $150 off... Read more

Unreal Mobile is offering a $100 discount on...

Unreal Mobile, an MVNO using AT&T and T-Mobile’s networks, is offering a $100 discount on any new iPhone with service. This includes new iPhone 16 models as well as iPhone 15, 13, and SE phones... Read more

16-inch M4 Pro MacBook Pros on sale for $250-...

Don’t pay full price! Amazon has 16-inch M4 Pro MacBook Pros (Silver or Black colors) on sale right now for up to $300 off Apple’s MSRP. Shipping is free. These are the lowest prices currently... Read more

Get a 14-inch M4 MacBook Pro for up to $240 o...

Amazon is offering a $150-$250 discount on Apple’s 14-inch M4 MacBook Pros right now. Shipping is free. Be sure to select Amazon as the seller, rather than a third-party seller: – 14″ M4 MacBook Pro... Read more

Clearance 14-inch M3 Pro MacBook Pros availab...

B&H Photo has clearance 14″ M3 Pro MacBook Pros (in Black or Silver) on sale for $500 off original MSRP, only $1499. B&H offers free 1-2 day delivery to most US addresses: – 14″ 11-Core M3... Read more

Sams Club is offering a $50 discount on Titan...

Sams Club has Titanium Apple Watch Series 10 models on sale for $50 off Apple’s MSRP. Sams Club Membership required. Note that sale prices are for online orders only, in-store prices may vary. Choose... Read more

Sunday Sale: Apple’s latest 13-inch M4 MacBoo...

Amazon has new 13″ M4 MacBook Airs on sale for $150 off MSRP right now, starting at $849. Sale prices apply to most colors and configurations. Be sure to select Amazon as the seller, rather than a... Read more

Apple’s M4 Mac minis on sale for record-low p...

B&H Photo has M4 and M4 Pro Mac minis in stock and on sale right now for up to $150 off Apple’s MSRP, each including free 1-2 day shipping to most US addresses. Prices start at only $469: – M4... Read more

Week’s Best Deals: 14-inch M4 MacBook Pros fo...

Don’t pay full price! These retailers are offering $200-$250 discounts on new 14-inch M4 MacBook Pros this week…they are the lowest sale prices available for new MacBook Pros: (1): Amazon is offering... Read more

Jobs Board

SPREAD THE WORD:
Slashdot
Digg
Del.icio.us
Reddit
Newsvine

MacTech