TweetFollow Us on Twitter

Support Multibyte Text

Volume Number: 14 (1998)
Issue Number: 1
Column Tag: Toolbox Techniques

Supporting Multi-byte Text in Your Application

by Nat McCully, Senior Software Engineer, Claris Corporation

How to make most efficient use of the Script Manager and International APIs in your text engine

Introduction

You have developed the next greatest widget or application and you want to distribute it on the Net. Your application has a text engine, maybe simple TextEdit, or one of the more sophisticated engines available for license, or even one you wrote yourself. One day, you get an e-mail from someone in Japan:

Dear Mr. McCully,
My name is Takeshi Yamamoto and I use your program, MagicBook, everyday. But, I have a problem using Japanese characters in it. When I hit the delete key, I get weird garbage characters. My friends and I wish to use both Japanese and English in your program, but it does not work properly. Please fix it!
T. Yamamoto

Suddenly, there are people on the other side of the world who want to use your application in their language, and you are faced with a dilemma. You have no first-hand knowledge of the language itself, but you may be somewhat familiar with the Macintosh's ability to handle multiple languages in a single document with ease. Simply cracking open Inside Macintosh: Text seems daunting. How will these new routines affect the performance of your program? Will you introduce unwanted instability and anger your existing user base? Where can you find information on how to use which routines best, not just a description of what each routine does?

This article attempts to address some of these issues, and in general familiarize you with some of the best things that make the Mac an excellent international computing environment. Intelligent use of the Macintosh's international routines, WorldScript, and the other managers in the Toolbox can be the difference between a US-only application and a truly "world-ready" tool that any user, anywhere, can utilize as soon as they download it to their hard disk. Although this paper deals primarily with Japanese language issues, the concepts outlined herein can be used with any multi-lingual environment.

What is WorldScript?

WorldScript is the set of patches to the system that enables the correct display and measurement of multi-lingual text. Over time, many of these patches have been rolled in to the base system software, but even in MacOS 7.6.1, you will find a set of WorldScript extensions in the Extensions folder when you install one of the Language Kits available from Apple. The concepts and code snippets in this paper will work equally well on, for example, the Japanese localized Mac OS, or on a standard U.S. system with the Japanese Language Kit (JLK). A good source of localized system software and Language Kits is the Apple Developer Mailing CD-ROM, available from the Apple Developer Catalog. WorldScript is one of the Apple-only technologies that makes multi-lingual computing possible in a far easier way than the other guys. When it comes to having Chinese, Korean and Japanese all in the same document, WorldScript, on Mac OS, is the only thing out there.

Multi-byte Text on the Mac

OK, let's get to the meat, you say. How do you make your text engine handle two-byte characters? Well, before giving you a bunch of code, let's explain how the Mac handles two-byte text.

What is a Script?

Each language that the Mac supports is grouped into categories called "scripts." For example, English and the other Roman letter-based languages like French and German all belong to the Roman script. Japanese belongs to the Japanese script. Character glyphs in the Roman script are each represented by a single-byte character code. Japanese characters are represented by a 16-bit (2 byte) character code.

Setting Up the Port: Pre-System 7 API's versus New API's

On the Mac, each font is also associated with a script. There are Roman script fonts like Helvetica and Palatino, and Japanese script fonts like Osaka and Heisei Minchou. As you know, when text is drawn into a QuickDraw grafPort, you first set up the port with the appropriate font, size and style, and then call DrawText() to draw the text. If you are using the old Script Manager API's (like CharByte() and GetEnvirons()), you need to set the port to a font in the script you are interested in processing. Once the port is set to a particular font, calls to the Script Manager will follow the rules of that font's script. So the port, by way of setting the font, also has an implicit script setting. This is the key to using the Script Manager routines so they will return the correct information to your application using the older API's. The new API's have a script parameter, so it is not necessary to set the font of the port before using them. Since the Script Manager doesn't have to call FontScript() to find out the script of the current font before passing the script to WorldScript, using the newer API's could speed up your application is certain cases.

Adding Script-savvy Features to your Application

First, you need to determine if the user's system has a non-Roman script system installed. Many programs boost performance by calling Script Manager routines only when absolutely necessary. One way to find out all the scripts installed is to loop through all 33 possible script codes (smRoman being 0 and smUninterp being 32) and calling GetScriptManagerVariable() with the selector smEnabled. Roman script is always enabled.

Listing 1: Finding Which Scripts are Installed

ScriptCode   script;
for (script = smRoman + 1; script <= smUninterp; script++)
{
   if (GetScriptManagerVariable(script, smEnabled))
      InitInternalScriptInfo(script);
            // non-Roman script present...
}

InitInternalScriptInfo() refers to a routine you write that will initialize your internal data structures that deal with specific script systems, such as line-breaking tables, on a script-by-script basis.

Line Breaking

Most applications don't rely entirely on the Script Manager for line breaking, hit testing, or word selection, because using those routines is thought to be too slow. It is possible to optimize your text engine so that you incorporate the correct behaviors for each script system present, while maintaining the highest possible performance. The Toolbox call for finding line breaks is StyledLineBreak(). To use it, however, you must restrict the text you pass it to lengths of less than 32K (actually, this is true of the whole Script Manager, so tough) and text widths to whole pixel values (can you say 'rounding error?'), and if you are explicitly scaling the text, it won't work at all. You must also organize the text you pass to it in terms of script runs and style runs within them. Therefore, most applications that have word-processing functionality choose to implement their own line-breaking code that is customized for their own needs. Unfortunately, many of these private implementations break when used on WorldScript systems.

Line Breaking with Japanese Text

The simplest line-breaking algorithm for English text is to look for a space (ASCII 0x20) character in the line near the graphic break, and if there is none, to break on the byte-boundary nearest the graphic break. Japanese text is a bit more complicated. Japanese text has no spaces, so you must break at the character boundary nearest the graphic break. There is an additional wrinkle: Certain characters are not allowed to begin a line, and certain characters are not allowed to end a line. This set of line-breaking rules is referred to as Kinsoku shori. For example, you cannot begin a line with a two-byte period. You cannot end a line with a two-byte open parenthesis followed by more text on the next line. A list of kinsoku characters is available from the Japanese Standards Association in the form of a Japanese Industrial Standard (JIS) document. It is also in Ken Lunde's excellent book, Understanding Japanese Information Processing, in the section entitled "Japanese Hyphenation." While not all Japanese agree on the correct set of kinsoku characters, this set is a good default. Some applications allow the user to edit the kinsoku character set to their own liking.

Once you know that the current byte offset in your text is on or just before the graphic break, you need to see if that byte is part of a two-byte character. Then you need to see if the character is a character that can't end a line. Then you need to check the character after it to see if it is a character that can't begin a line. This can be repeated as necessary, for support of a string of kinsoku characters. For example, suppose the character on the graphic break (the break char) can end a line, but the character after it can't begin a line, causing the break char to wrap. However, the character before the break char is one that can't end a line, so you must then check the char before it, and so on, and so on, and... The example below is simplified to illustrate a particular case; actual code for an application would probably be organized differently.

Listing 2: Checking Graphic Break Char with Kinsoku Processing

CheckGraphicBreak
Check a text run's length against the pixel edge of the line
(the so-called graphic break), and then adjust the line break
if there is an illegal Kinsoku character on the break. 

UInt16 *   gStartLineKinsokuChars;   // chars that can't begin
                                     // a line.
UInt8      gNumStartKinsokuChars;    // number of chars above.
UInt16 *   gEndLineKinsokuChars;     // chars that can't end a line
UInt8      gNumEndKinsokuChars;      // number of chars above.

// This function will return FALSE if the char at offset
// is not a valid break point. It checks the char after it,
// but not the char before it, for kinsoku.
static Boolean CheckGraphicBreak(UInt8 *      textPtr,
                                              Uint16      offset,
                                              ScriptCode script);
{
   SInt16 result;

   // The textPtr starts at a known 'good' character
   // boundary. In this case it is the beginning of the
   // line, but it could be the beginning of the
   // stylerun.

   // Find out if script only has 1-byte chars. If so,
   // we assume it's ok to break at this char.
   if (GetScriptVariable(script, smScriptFlags) &
               (1 << smsfSingByte))
      return TRUE;

   result = CharacterByteType((Ptr)textPtr,
                  offset,
                  script);
   if (result == smSingleByte)
      return TRUE;   // In real life, you're not done
                        // until you check the chars before
                        // and after this one for kinsoku.
   if (result == smFirstByte)
      return FALSE;
   if (result == smLastByte)
   {
      UInt8      index;
      UInt16   theChar = *(UInt16 *)&textPtr[offset - 1];

      // Now we have a valid break on a 2-byte char.
      // We need to check if it's a kinsoku character.
      // This code checks Japanese kinsoku only, but
      // with a little work this could be extended to
      // all 2-byte scripts that don't break on spaces.
      if (script != smJapanese)
         return TRUE;
      
      for (index = 0; index < gNumEndKinsokuChars; index++)
      {
         if (theChar == gNumEndKinsokuChars[index])
            return FALSE;
      }
      
      // Now we check the char after this one, in case it
      // is a char that can't start a line. First see if
      // it's a 1-byte char. In real life, there are 1-byte
      // kinsoku chars to check for.
      if (textPtr[offset + 1] == NULL || 
         CharacterByteType((Ptr)&textPtr[offset + 1],
                  0, script) == smSingleByte)
         return TRUE;

      theChar = *(UInt16 *)&textPtr[offset + 1];
      
      for (index = 0; index < gNumStartKinsokuChars;
                        index++)
      {
         if (theChar == gNumStartKinsokuChars[index])
            return FALSE;
      }
   }
   return TRUE;
}

Hit Testing

Hit testing is another area in your text engine that demands the highest possible performance. When the user clicks in the text, any delay in setting the insertion point there will be noticed. Drag selection is another example of the same code working hard to find the character boundaries and setting the correct hilite area.

Some applications use a locally allocated cache of possible first byte character codes that they use to test a particular character in the text stream for byteness. This is simple to create with the Toolbox call FillParseTable(). FillParseTable() returns in your pre-allocated 256 byte buffer all the bytes that can be a first byte of a two-byte character in the script you pass to it. Be aware that in some scripts, some character codes can be both the first byte of a two-byte character as well as the second byte of a two-byte character, depending on their context within the text stream. Therefore, you need more than just this information to successfully find out what kind of character the byte you're interested in is a part of. In a mixed stream of text with both one-byte and two-byte characters, using the parse table in a single pass over the text is much faster than calling CharacterByteType() for each byte. An example of this is below, in a sample function that goes through a text stream and counts the number of characters in it:

Listing 3: Counting the Chars in Mixed-byte Text

CountCharsInScriptRun
Counts the number of characters in a run of contiguous script
(font), checking for both one-byte and two-byte characters.

UInt32 CountCharsInScriptRun(UInt8 * textPtr, UInt32 length,
               ScriptCode script)
{
   UInt8      parseTable[256];
   UInt32   curByte, charCount;
   

   (void)FillParseTable(&parseTable, script);
   
   for (curByte = 0L, charCount = 0L; curByte < length;
                        curByte++)
   {
      if (parseTable[textPtr[curByte]] == 1)
         continue;
      charCount++;
   }
   return (charCount);
}

Notice that because we started at a known 'good' boundary, we were able to test only the first bytes of the two-byte characters in the stream as we counted along. This code would not work in all cases if we started at an arbitrary point in unknown text, because of the ambiguity of the byteness of some character codes in some scripts. Caching the parse tables for all installed scripts in the user's system at launch time would further speed up your processing, so you wouldn't have to call FillParseTable() every time.

Measuring Two-byte Characters

On the Mac, all two-byte characters are the same width. In a future system software release, proportional two-byte characters will be supported, but up until now all two-byte-savvy applications assume mono-spaced two-byte characters, and even if proportional characters are supported, they will be mono-spaced by default so as not to break every application currently shipping.

Before the Mac OS supported measuring two-byte characters with TextWidth(), a special code point in the single-byte 256 char width table was reserved for the two-byte character width for that font. In the Japanese and both Chinese scripts, this code point is 0x83. In Korean script, it is 0x85. This code point still works, even though Apple now recommends you use TextWidth() for all measuring of multi-byte or mixed text. In the future for proportional measuring, TextWidth() will probably be what you will use.

Below is an example of a function that measures any text, and returns the amount in a Fixed variable. This is useful if you are measuring text and the user has Fractional Glyph Widths turned on (meaning you made a call to SetFractEnable()).

Listing 4: Measuring Mixed-byte Text

GetTextWidth
This function supports multiple styleruns in the text, and consists
of two loops: One for the styles and one for the characters in each
style. The port is set on each stylerun and then Macintosh APIs
are called to measure the text. 

#define JSCTCWidthChar   0x83
#define KWidthChar   0x85

typedef struct tagStyleRun {
   UInt32     styleStart;
   UInt16     font;
   UInt16     size;
   UInt8      face;
} StyleRun,   *StyleRunPtr;

Fixed GetTextWidth(UInt8 * textPtr, UInt32 length,
                           StyleRunPtr styleRuns, UInt32 numStyles)
{
   ScriptCode      curScript;
   UInt32          byteNum, styleNum;
   Fixed           totalWidth = 0L;
   ScriptCode      curScript;
   FMetricRec      curFontMetrics;
   WidthTable **   curWidthTable;
   UInt8           parseTable[256];
   
   // loop thru each stylerun, measure its characters
   for (styleNum = 0L; styleNum < numStyles; styleNum++)
   {
      // Set up the port (in real life, you'd restore the
      // old settings when you exit) 
      TextFont(styleRuns[styleNum].font);
      TextFace(styleRuns[styleNum].face);
      TextSize(styleRuns[styleNum].size);
      FontMetrics(&curFontMetrics);
      curWidthTable = curFontMetrics.wTabHandle;
      HLock((Handle)curWidthTable);
      curScript = FontScript();
      (void)FillParseTable(&parseTable, curScript);
      
      // loop thru each char in the stylerun
      for (byteNum = styleRuns[styleNum].styleStart;
            (styleNum + 1 < numStyles &&
               byteNum < styleRuns[styleNum+1].styleStart) ||
            (styleNum + 1 >= numStyles && byteNum < length);
               byteNum++)
      {
         if (parseTable[textPtr[byteNum]] == 1)
         {
            if (curScript == smJapanese ||
                  curScript == smTradChinese ||
                  curScript == smSimpChinese)
               totWidth +=
                  (*curWidthTable->tabData)[JSCTCWidthChar];
            else if (curScript == smKorean)
               totWidth +=
                  (*curWidthTable->tabData)[KWidthChar];
            else
               totWidth += (Fixed)
                  TextWidth(&textPtr[byteNum], 0, 2) << 16;

            byteNum++;
         }
         else
            totWidth += 
               (*curWidthTable->tabData)[textPtr[byteNum]];
      }
      HUnlock((Handle)curWidthTable);
   }

   return (totWidth);
}

The above function still makes expensive calls like FontMetrics(), FillParseTable() and TextWidth() on each stylerun. It would be an even better idea to have a local cache of the width tables and parse tables of fonts you know are in the document, so you don't have to rebuild them every time the user clicks or drags or types in the text.

So, now that you have a relatively fast way of measuring the text, you can use it to find the pixel value of any character in the text, and use that for your internal CharToPixel and PixelToChar logic. Or, you can use the Mac OS Toolbox calls CharToPixel() and PixelToChar(), which will always work on any script but may be slower.

Localizing Your Application for Japan

Now that we have reviewed a few of the basic text engine issues for handling two-byte text, there are a few things about Japan in particular that make localization a challenge.

Japan is possibly the most interesting major software market to localize for, if you are interested in text and text layout. It is a mature market, with a diverse number of products enjoying many millions of dollars in sales each year. The Macintosh has a larger market share there than in the U.S. or Europe. Text in Japan has traditionally been difficult to input and output using machines, and the use of text in graphic design requires that the text layout be extremely flexible. The characters are complex (so complex that bolding them may make them illegible), and emphasis or adornment has forms that use background shading, different types of lines around the text, and even dots or ticks above or to one side of each character. Condensed and extended faces have different results on PostScript printers than they do on QuickDraw. Bold and italic faces were not supported on the first PostScript Japanese printers. Underlines are not drawn by QuickDraw when the font is a Japanese font. These last two things might be fixed in future releases of the system software, but for now the application developer must work around them.

For underline, you must draw a line under the text. The reason QuickDraw doesn't draw it for you is that it usually uses the font's baseline as the underline location, but Japanese fonts' two-byte glyphs take up more room and descend below the baseline. Where you draw your underline is up to you, but take a look at how other Japanese programs do it and make it fairly consistent. Vertical text is pretty much a checkbox item nowadays in Japanese word-processing programs. Most novels and many magazines are layed out vertically, but until recently computers were horizontal-only. While the Windows95® APIs support drawing text vertically, the Mac OS still does not, outside of using QuickDrawGX typography (which is excellent, by the way). In comparing vertical text to horizontal text, several things change about the line layout: The first line starts at the top right, and the text flows down to line-end, then wraps to the next line, which is to the left of the first line; the baseline is generally considered to be in the center of the line; underlines are drawn to the right of the text, as are emphasis dots; two-byte characters are not rotated, but single-byte characters are, 90° clockwise; certain characters have vertical text variants, like many punctuation characters. Where these variants are in the font can be found in the 'tate' table in the font ("tahteh" means "vertical" in Japanese).

Rubi are small annotation characters, placed above, below or to the side of the text they annotate. Usually they provide pronunciation guidance for unusual or hard-to-pronounce Kanji characters.

Date formats in Japan include the current year of the emperor's reign; again, supported on Windows95® but not on Mac OS. It is up to the application to support these formats if so desired. Also, date formats 2 and 3 produce identical results, due to the fact that the abbreviated month and the long month are the same thing in Japanese. Japanese applications may opt to substitute a different format in one of those formats' place.

Find and Replace needs to be expanded to include the different types of characters used in Japanese. Standard Japanese text may contain any of the following types of characters: one-byte Roman, two-byte Roman, one-byte numerals and symbols, two-byte numerals and symbols, one-byte katakana syllables, two-byte katakana syllables, two-byte hiragana syllables, and two-byte Kanji characters. The hiragana and katakana characters are equivalent in terms of the sounds they represent in Japanese, so a good Find/Replace function should include an option to find the search string in either syllabary.

Sorting in Japanese is difficult because the Kanji characters can have different pronunciations depending on their context. To sort Kanji correctly, you need a separate kana key field that indicates the pronunciation and you sort on that. Also, Mac OS CompareText() doesn't sort the long sound symbol correctly (that symbol changes sound depending on the character before it, but Mac OS always sorts it in the symbols area), so for linguistically correct sorting you need to write your own sorting routine.

If your application supports character tracking using the Color QuickDraw function CharExtra(), be aware that the CGrafPort member chExtra only uses 4 bits for signed integer values and the other 12 bits for the fraction. The value you pass to CharExtra() is a Fixed value of how many pixels you want to track out (or in) the text, and QuickDraw divides that by the current text size, to arrive at the chExtra value. This means that if the tracking value you pass to CharExtra() is greater than 8 times the text size, the chExtra field will go negative, and your text will be drawn incorrectly. Unfortunately, Japanese text is routinely tracked out beyond this limit in many applications. The only workaround is for you to draw the text one character at a time, and use the QuickDraw pen movement calls like MoveTo() to move the pen yourself. The same is true for SpaceExtra().

Inline Input

Inline input of Japanese, Chinese or Korean is a way of using an intermediate program (called an Input Method) to translate your keystrokes into the many thousands of possible characters in those languages, all in the same place on screen that you would normally see characters typed in the line. In Japanese, the Input Method changes your keystrokes into phonetic Japanese kana characters, then converts some of those characters into Kanji characters to form a mixed kana and Kanji sentence. Then the user hits the return key to confirm the text in the line, ending the inline input session. Inline input on the Mac on System 7.1 or later uses the Text Services Manager (TSM). If your application uses TextEdit as its main text engine, you can support inline input quite easily using TSMTE. If you have your own text engine, you will need to do more work to support TSM Inline Input.

TSM uses Apple events to send and receive data between your application and the Input Method. You must implement several AppleEvent handlers, the most complex of which is the kUpdateActiveInputArea. In that handler, you must draw the text in all its intermediate stages, as the user is composing and editing the Japanese sentence before s/he confirms it to the document. If there is text after the so-called 'inline hole,' you must actively reflow the text if such editing causes the length to change. Each time the user makes a change, the text in the inline hole is received from the Input Method in an Apple event. The application draws it in the text stream, along with special Inline styles that help the user tell which text in the inline hole is raw (unconverted) text, which is converted text, which is the active phrase, where the phrase boundaries are in the inline hole, and other information.

After implementing the TSM support in your application, it is imperative that you test it with third-party Input Methods. At the time TSM was introduced, the documentation for how to write an Input Method was still a little spotty. This resulted in each Input Method handling text slightly differently. Also, Kotoeri, Apple's Input Method, has fewer features than the leading third-party Input Methods. Be sure to test your application with all of them you can find, so you can verify that it won't crash or produce strange results. Some Input Methods have strange quirks, like always eating mouseDown events, or having different requirements about how large a buffer they can handle without crashing. This knowledge comes from testing, and sometimes can be found on the Internet in Usenet newsgroups (in Japanese).

What About Unicode?

Unicode is being billed as the latest panacea for the problems of internationalization. What does Unicode give you? Where does it fall short?

Unicode was designed to solve one problem: There are many incompatible, overlapping encoding schemes for different languages, and supporting all of these encodings is a complex problem. What if there was a single encoding scheme that supported all the writing systems of the world, and guaranteed that you could display text in all the languages Unicode supports if only you had the right Unicode font for each language? Unicode tries to be that encoding.

For Japanese text data, the Mac OS and Windows95 use Shift-JIS internally, while Rhapsody and WindowsNT® use Unicode. On the Internet, most Japanese text is encoded using the 7-bit ISO-2022-JP standard. Whether or not you use Unicode to represent text internally to your application, you will have to support all three standards for full file and data compatibility with the rest of the world. In Unicode, all characters are two bytes long. So, you no longer have to worry about testing for byteness in a Unicode stream. However, all ASCII characters are represented with a leading 0x00 in Unicode. So you can't have loops that look for a terminating NULL in a C-string. And, all your formerly one-byte text doubles in size unless you explicitly compress it (and then you lose the byteness testing advantage).

Whether or not you think testing byteness is too complex or expensive to do, you should know that Unicode also does another controversial thing: For the so-called "Han" languages (Japanese, Chinese, Korean) that use characters that originated in China, it attempts to unify them into one codepoint for each character judged by the Unicode Consortium to be unique, even if it has variant forms in each language. The same is true for Arabic languages (Persian & Farsi). Because of this, you cannot tell what language a character is in just from its codepoint. Unicode was not designed to be a multi-lingual solution, in that representations of Chinese and Japanese in the same document will have overlapping character codes, requiring the OS to provide a parallel linguistically-coded data structure to render the glyph forms appropriately to each language. This might be another version of today's font/script/language relationship on Mac OS. As you can imagine, the Chinese, Japanese and Korean governments have each published competing encoding standards to Unicode, labeling the latter as something designed by foreigners who didn't understand the issues (both political and linguistic) involved in trying to make a worldwide encoding system.

Another issue about Unicode is that although it can represent 65,536 characters, there is not enough space for all the Han characters and their variants, plus all the other languages that Unicode currently supports. New languages are becoming computerized as more countries join the Digital Revolution and the Unicode Consortium cannot give space to all of them. Preferring the flat encoding model, they came up with another standard that uses four bytes per character (the ISO 10646 encoding standard). Given that on the Internet, where many languages need simultaneous support on computers, bandwidth is at a premium, I would prefer using the ISO 2022 standard of mixed-byte (7-bit and 14-bit characters) plus the escape codes that tell you what language the current stream is in to sending 32-bit characters through the wire. Since most web pages use this encoding, expect your OS to provide utilities for encoding conversion (like the Mac OS Encoding Converter debuting soon on a Mac near you).

Cross-Platform Development Issues

Going cross-platform is already complicated without having to think about internationalization. Should you have separate codebases for maximum use of each platform's unique features? Or should you have a single codebase and use an emulation layer for the other OS's APIs? Each has its advantages, but for this article I can speak to those of you who have a joint codebase, and tell you about some of the things that the Windows platform lacks that you have to write yourself for multi-byte support and internationalization.

Windows has no Script Manager. There is no Gestalt Manager. It cannot support multiple two-byte codepages at the same time. It uses totally separate fonts for vertical and horizontal text. It supports proportional kana in Japanese, so you can't assume all two-byte characters are the same width.

If your code uses the Script Manager routines heavily, then you will have to write them yourself on the Windows side. All the convenience of the Mac OS's international routines comes very clear when you try the same things on a PC!

Also, Japan once again has its own special challenges. Until Windows came out in Japan, each computer manufacturer made its own proprietary OS and hardware. Even floppy disks were incompatible with each other. Now, most companies have adopted the Intel PC standard, but NEC continues to manufacture its own line of incompatible PCs. NEC has such a huge share of the market in Japan that it has teamed up with Microsoft to produce its own version of Windows95 for NEC. So when you buy Windows95 in Japan, you find there are three versions: MS Windows95 for Intel, MS Windows95 for NEC, and NEC Windows95 for NEC. All three versions are basically the same feature-for-feature, but the drivers are different and you need to test your application on each platform to verify compatibility.

On the hardware side, you will find that Japanese hardware is different: They use different displays, different keyboards, different printers, and different floppy formats. The drive lettering on NEC machines is different from Intel PCs: The hard disk drive is labeled 'A:' on one and 'C:' on the other. Make sure your installer isn't hard-coded to install on drive C:.

Conclusion

As we have seen, internationalization of your software on Mac OS is not very difficult to do, and it is to your benefit to try and enable as many users as possible to enter text in their own language when using your program. We have also examined Japanese localization in more depth, and demonstrated that Japanese language applications usually require some amount of new features designed specifically for that language's needs and conventions. As more markets around the world reach maturity, you can be sure that there will be ample opportunity to differentiate your product by adding locale-specific features. It is these locale-specific features that will tell your users that they are valued customers, and that their needs are being addressed in a very specific way. For your product, especially if you are in the initial designing phases, I would recommend you try to make it as easily expandable as possible. Design generic internationization into the core modules, while leaving open the opportunity to add locale-specific features for certain markets like Japan, as you see your product's market expand and rise in success.

Bibliography and Related Reading

  • Apple Computer, Inc. Inside Macintosh: Text, Menlo Park, CA: Addison Wesley, March 1993.
  • Apple Computer, Inc. "Technote OV 20, Internationalization Checklist," Cupertino, CA: Apple Computer, Inc, November 1993.
  • Griffith, Tague. "Gearing Up for Asia With the Text Services Manager and TSMTE," Develop Issue 29. Cupertino, CA: Apple Computer, Inc, March 1997.
  • Apple Computer, Inc. "Technote TE 531, Text Services Manager Q&As," Cupertino, CA: Apple Computer, Inc, May 1993.
  • Lunde, Ken. Understanding Japanese Information Processing, Sebastopol, CA: O'Reilly & Associates, September, 1993.

See also Ken Lunde's home page at http://jasper.ora.com/lunde/ for more information about multi-byte text processing on computers.


Nat McCully has been at Claris in the Japanese Development Group for the last 6 years. He has worked on numerous Japanese products, including MacWrite II-J, Filemaker Pro-J, Claris Impact-J, ClarisDraw-J, and ClarisWorks-J. He speaks, reads and writes Japanese, and enjoys traveling in Japan. He is currently working as Development Lead on the next release of ClarisWorks-J.

 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Latest Forum Discussions

See All

Summon your guild and prepare for war in...
Netmarble is making some pretty big moves with their latest update for Seven Knights Idle Adventure, with a bunch of interesting additions. Two new heroes enter the battle, there are events and bosses abound, and perhaps most interesting, a huge... | Read more »
Make the passage of time your plaything...
While some of us are still waiting for a chance to get our hands on Ash Prime - yes, don’t remind me I could currently buy him this month I’m barely hanging on - Digital Extremes has announced its next anticipated Prime Form for Warframe. Starting... | Read more »
If you can find it and fit through the d...
The holy trinity of amazing company names have come together, to release their equally amazing and adorable mobile game, Hamster Inn. Published by HyperBeard Games, and co-developed by Mum Not Proud and Little Sasquatch Studios, it's time to... | Read more »
Amikin Survival opens for pre-orders on...
Join me on the wonderful trip down the inspiration rabbit hole; much as Palworld seemingly “borrowed” many aspects from the hit Pokemon franchise, it is time for the heavily armed animal survival to also spawn some illegitimate children as Helio... | Read more »
PUBG Mobile teams up with global phenome...
Since launching in 2019, SpyxFamily has exploded to damn near catastrophic popularity, so it was only a matter of time before a mobile game snapped up a collaboration. Enter PUBG Mobile. Until May 12th, players will be able to collect a host of... | Read more »
Embark into the frozen tundra of certain...
Chucklefish, developers of hit action-adventure sandbox game Starbound and owner of one of the cutest logos in gaming, has released their roguelike deck-builder Wildfrost. Created alongside developers Gaziter and Deadpan Games, Wildfrost will... | Read more »
MoreFun Studios has announced Season 4,...
Tension has escalated in the ever-volatile world of Arena Breakout, as your old pal Randall Fisher and bosses Fred and Perrero continue to lob insults and explosives at each other, bringing us to a new phase of warfare. Season 4, Into The Fog of... | Read more »
Top Mobile Game Discounts
Every day, we pick out a curated list of the best mobile discounts on the App Store and post them here. This list won't be comprehensive, but it every game on it is recommended. Feel free to check out the coverage we did on them in the links below... | Read more »
Marvel Future Fight celebrates nine year...
Announced alongside an advertising image I can only assume was aimed squarely at myself with the prominent Deadpool and Odin featured on it, Netmarble has revealed their celebrations for the 9th anniversary of Marvel Future Fight. The Countdown... | Read more »
HoYoFair 2024 prepares to showcase over...
To say Genshin Impact took the world by storm when it was released would be an understatement. However, I think the most surprising part of the launch was just how much further it went than gaming. There have been concerts, art shows, massive... | Read more »

Price Scanner via MacPrices.net

Apple Watch Ultra 2 now available at Apple fo...
Apple has, for the first time, begun offering Certified Refurbished Apple Watch Ultra 2 models in their online store for $679, or $120 off MSRP. Each Watch includes Apple’s standard one-year warranty... Read more
AT&T has the iPhone 14 on sale for only $...
AT&T has the 128GB Apple iPhone 14 available for only $5.99 per month for new and existing customers when you activate unlimited service and use AT&T’s 36 month installment plan. The fine... Read more
Amazon is offering a $100 discount on every M...
Amazon is offering a $100 instant discount on each configuration of Apple’s new 13″ M3 MacBook Air, in Midnight, this weekend. These are the lowest prices currently available for new 13″ M3 MacBook... Read more
You can save $300-$480 on a 14-inch M3 Pro/Ma...
Apple has 14″ M3 Pro and M3 Max MacBook Pros in stock today and available, Certified Refurbished, starting at $1699 and ranging up to $480 off MSRP. Each model features a new outer case, shipping is... Read more
24-inch M1 iMacs available at Apple starting...
Apple has clearance M1 iMacs available in their Certified Refurbished store starting at $1049 and ranging up to $300 off original MSRP. Each iMac is in like-new condition and comes with Apple’s... Read more
Walmart continues to offer $699 13-inch M1 Ma...
Walmart continues to offer new Apple 13″ M1 MacBook Airs (8GB RAM, 256GB SSD) online for $699, $300 off original MSRP, in Space Gray, Silver, and Gold colors. These are new MacBook for sale by... Read more
B&H has 13-inch M2 MacBook Airs with 16GB...
B&H Photo has 13″ MacBook Airs with M2 CPUs, 16GB of memory, and 256GB of storage in stock and on sale for $1099, $100 off Apple’s MSRP for this configuration. Free 1-2 day delivery is available... Read more
14-inch M3 MacBook Pro with 16GB of RAM avail...
Apple has the 14″ M3 MacBook Pro with 16GB of RAM and 1TB of storage, Certified Refurbished, available for $300 off MSRP. Each MacBook Pro features a new outer case, shipping is free, and an Apple 1-... Read more
Apple M2 Mac minis on sale for up to $150 off...
Amazon has Apple’s M2-powered Mac minis in stock and on sale for $100-$150 off MSRP, each including free delivery: – Mac mini M2/256GB SSD: $499, save $100 – Mac mini M2/512GB SSD: $699, save $100 –... Read more
Amazon is offering a $200 discount on 14-inch...
Amazon has 14-inch M3 MacBook Pros in stock and on sale for $200 off MSRP. Shipping is free. Note that Amazon’s stock tends to come and go: – 14″ M3 MacBook Pro (8GB RAM/512GB SSD): $1399.99, $200... Read more

Jobs Board

*Apple* Systems Administrator - JAMF - Syste...
Title: Apple Systems Administrator - JAMF ALTA is supporting a direct hire opportunity. This position is 100% Onsite for initial 3-6 months and then remote 1-2 Read more
Relationship Banker - *Apple* Valley Financ...
Relationship Banker - Apple Valley Financial Center APPLE VALLEY, Minnesota **Job Description:** At Bank of America, we are guided by a common purpose to help Read more
IN6728 Optometrist- *Apple* Valley, CA- Tar...
Date: Apr 9, 2024 Brand: Target Optical Location: Apple Valley, CA, US, 92308 **Requisition ID:** 824398 At Target Optical, we help people see and look great - and Read more
Medical Assistant - Orthopedics *Apple* Hil...
Medical Assistant - Orthopedics Apple Hill York Location: WellSpan Medical Group, York, PA Schedule: Full Time Sign-On Bonus Eligible Remote/Hybrid Regular Apply Now Read more
*Apple* Systems Administrator - JAMF - Activ...
…**Public Trust/Other Required:** None **Job Family:** Systems Administration **Skills:** Apple Platforms,Computer Servers,Jamf Pro **Experience:** 3 + years of Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.