March 96 - Country Stringing: Localized Strings for the Newton
Country Stringing: Localized Strings for the Newton
Maurice Sharp
Newton products are currently available localized for English, French, German,
and Swedish. Thus, to take full advantage of the market, Newton applications
must be developed for four languages. As of Newton Toolkit version 1.5, there's
a mechanism for localizing strings at compile time but no built-in support for
organizing all the categories of strings across the different languages (unlike
on the Macintosh, where you can use resources). This article presents a couple
of ways to organize localized strings in your Newton application.
Until Newton Toolkit 1.5, developing an application for English, French,
German, and Swedish required four different application projects or many skanky
contortions. This was tedious, to say the least, but necessary for those who
wanted to take full advantage of the worldwide market for Newton products.
Newton Toolkit 1.5 provides support (with the SetLocalizationFrame and LocObj
calls) for localizing your applications from just one project. But this is
useful only at compile time, and it doesn't provide an infrastructure for
organizing and categorizing the localized objects. In other words, you can have
different strings for four locales, but how you keep track of what strings you
have and which ones need localizing is up to you. Macintosh developers don't
have this problem because all strings can reside in resources; changing the
strings in the resources changes them in the application.
This article presents two ways to organize your localized strings. Both methods
are meant to be used at compile time, but there's also information on changing
strings at run time. Before reading this article, you should be familiar with
the information in the Newton Programmer's Guide on localizing Newton
applications.
STRINGING YOU ALONG WITHOUT RESOURCES
In a Macintosh application you can keep localized strings in the 'STR#'
resource of the resource fork. This isn't an option in a Newton application for
two reasons: ResEdit doesn't directly support Unicode strings, and, more
important, a Newton application doesn't have a resource fork. All your strings
have to reside somewhere in your application package.
A first cut at a solution to the problem of how to organize localized strings
in your Newton application would be to have a viewSetupFormScript or TextSetup
method (where applicable) that sets a particular string based on some
application-global setting. This solution has several disadvantages, such as
spreading localized strings throughout the code (resulting in multiple copies
of strings) and requiring all strings for all countries to be included.
If you've programmed the Newton for a while, you might think of taking
advantage of dead code stripping and using an if statement that switches on a
compile-time constant. This would eliminate unused localized strings but is
still awkward.
The best idea is a technique that lets you keep all your strings together. You
can do this by defining a frame in your Project Data with one slot per string
that you want to localize. You can even use nested frames. For example:
constant kUSStrings := '{
AppName: "World Ready!",
ExtrasName: "World!",
HelloWorld: "Hello World",
Dialogs: {
OK: "OK",
Cancel: "Cancel",
Yes: "Yes",
No: "No",
},
};
constant kFrenchStrings := ...
In
Newton Toolkit 1.5 and later, you can use this frame with SetLocalizationFrame.
Unfortunately, there's no specification for how to build up the frame, which is
essential to organizing your strings in a sane way. Also, SetLocalizationFrame
is meant only for compile-time localizations. With some extra effort you can
organize the strings in a way that allows them to be localized at run time as
well. As the next section shows, the key is using the Load command in
combination with a few constant functions.
LINGUA FRAMA -- CREATING THE LANGUAGES FRAME
In the previous section, we defined a frame that can be used for each target
language. Each of those target language frames can be nested into an outer
frame, called the
languages frame. Each target language subframe contains the
localized strings in that language. These subframes can in turn contain other
subframes, enabling you to group strings into logical categories such as
strings used in filing, strings used in searching, and so on. Each of the
frames at the top level of the languages frame must have the same structure. If
you have a path in the USEnglish frame of Entries.Names.Phones.Home, that path
will also need to exist in French, German, and any other languages your
application supports.
The overall structure of the languages frame is as follows:
{USEnglish: {
AppName: "World Ready!",
Dialogs: {
Cancel: "Cancel",
OK: "OK",
// ... and so on
},
French: {
AppName: "Prêt pour le Monde!",
Dialogs: {
OK: "OK",
Cancel: "Annuler",
// ... and so on
},
German: {
AppName: "Welt Ready!",
Dialogs: {
OK: "OK",
Cancel: "Absagen",
// ... and so on
},
// ... and so on
}
This
is the format of the frame you would pass to SetLocalizationFrame as well as of
a constant that can be used in runtime localization. Typically, the languages
frame would be kept in a text file or in your Project Data. The problem with
this is that the frame is rather large, and adding or changing an entry in a
language subframe can be difficult. Also, several entries are identical (such
as the string for OK).
A better solution is to separate the localized strings by category. This
article uses the target languages as the categories, though you could also
employ similar techniques with other categories. Once the strings are split,
you can use the Load command to assemble the languages frame.
There are two main schemes for organizing the strings. One uses simple text
files and works on both the Mac OS and Windows platforms. The other uses
compile-time functions to read the strings from some other format; on the
Macintosh platform, this method can be used to construct the languages frame
from a resource file. We'll look at each of these methods in turn.
LOADING FROM TEXT FILES
In the first scheme, you separate each language into a different text file.
Remember that Load will return the result of the last statement it executes in
the specified file. This means that each text file will specify one frame. For
example, the contents of your French text file might look like this:
{
AppName: "Prêt pour le Monde!",
Dialogs: {
OK: "OK",
Cancel: "Annuler",
// ... and so on
}
};
You
could then modify your Project Data to build the localization frame:
SetLocalizationFrame({French: Load(HOME & "FrenchStrings.f"), ...
It's
also helpful to have some string constants that can be used in multiple places.
A good example is the string for OK, which is the same in some languages. To do
this, you should load some general constants before constructing the individual
languages that make up the languages frame. So the overall process for building
the languages frame would be as follows:
- Load a file of string constants.
- Construct an empty languages frame.
- For each language, build the individual target language frame and add
it to the languages frame.
You only need predefined constants if you aren't using object combination.
Object combination, a feature that exists as of Newton Toolkit version 1.6,
would solve the problem of multiple instances of a single string (such as
"OK").*
The above description smells of an algorithm. Since you can run NewtonScript at
compile time, you can call a function to load a languages frame from text files
(see Listing 1). The main trick of this function is that it uses the language
symbol to create a pathname for Load.
Listing 1. CreateLanguagesFrameFromText
global CreateLanguagesFrameFromText(GlobalsFilePath,
LanguagesSymArray)
begin
if GlobalsFilePath then
Load(GlobalsFilePath);
local langFrame := {};
foreach sym in LanguagesSymArray do
langFrame.(sym) := Load(HOME & sym & "Strings.f");
langFrame;
end;
You
can define this function in a text file (say, WorldStrings.f) that you add to
your project. Note that you must compile this file before you load your
international strings.
You could use the languages frame directly as the argument to
SetLocalizationFrame; however, as we'll see later in this article, there are
better ways to use the frame.
LOADING FROM RESOURCES
The second scheme creates the languages frame from a resource file. You can
apply the methodology to other non-text file sources as well. To take advantage
of the code below, you'll need Newton Toolkit 1.6 or later. One important
point: all of this code works only for Roman-based languages.
To make life easier, we'll define a template in ResEdit that shows all the
localized versions of a particular string. The template defines a resource of
type 'LOC#', which is loosely based on the 'STR#' resource (see Table 1).
Because we're using a template, the number of languages must be defined in
advance; we'll choose 5 as a nice arbitrary number. You can find the 'LOC#'
template in the sample code on this issue's CD.
You can now use the 'LOC#' resource to enter all of your strings, grouped into
categories that make sense to you. The advantage of this resource is that the
path expression in the languages frame and all localized strings for that path
expression are grouped together.
You may be wondering why the 'LOC#' template contains an English string. If you
use LocObj, the first argument is a string that's taken as the English
localization. For the case where you're only localizing at compile time, the
English string is redundant. But if you want to localize at run time, you'll
need the English string around.
If you're familiar with the resource calls in the Newton Toolkit, you will have
spotted a potential problem: there's no way to query for the available resource
IDs of a particular resource. The basic solution to this problem is to try
reading a resource and to catch the exception that the Newton Toolkit throws if
the resource isn't present. Unfortunately, iterating through all possible
resource IDs while catching exceptions takes several minutes.
So we impose these restrictions: there can be any number of 'LOC#' resources
but they must be numbered consecutively, and the first resource ID must be
either 0 (because programmatically generated resources are likely to start with
0) or 128 (because those created in ResEdit will start with 128). The code in
Listing 2 generates an array of resources of a given type based on these
criteria.
Listing 2. GetAllResources
global GetAllResources(ResType, NewtType)
begin
local result := [];
local atID := 0;
// See if we can read in resource ID 0. If so, increment the
// next resource ID; if not, set the ID to 128.
try
AddArraySlot(result, GetResource(ResType, atID, NewtType));
atID := 1;
onexception |evt.ex.msg| do
atID := 128;
// Start at the current resource ID (either 1 or 128) and
// continue reading in resources until an exception occurs.
loop
begin
try
AddArraySlot(result, GetResource(ResType, atID, NewtType));
atID := atID + 1;
onexception |evt.ex.msg| do
break;
end;
result;
end;
Once
you have an array of 'LOC#' resources, you need to parse these resources into
NewtonScript path expressions and strings. The code in Listing 3 gets all the
'LOC#' resources and generates a languages frame.
Listing 3. CreateLanguagesFrameFromRsrc
global CreateLanguagesFrameFromRsrc(ResFilePath, LanguagesSymArray)
begin
// Throw if there aren't exactly 5 languages.
if Length(LanguagesSymArray) <> 5 then
Throw('|evt.ex.msg|,
"The LanguagesSymArray must be exactly 5 elements long.");
// The languages frame array that will be returned
local langFrame := {};
foreach sym in LanguagesSymArray do
langFrame.(sym) := {};
// Could use a constant since currently must be exactly 5
// languages.
local numLanguages := Length(LanguagesSymArray);
local r := OpenResFileX(ResFilePath);
local locResourceArray := GetAllResources("LOC#", 'binaryObject);
/* Process the LOC# resources. The format of the resource is:
16-bit count of number of string sets
string set 1
string set 2...
string set n
string set:
pathexpression as C string
English as C string
French as C string
German as C string
other1 as C string
other2 as C string
*/
local numStringSets;
local pathExpr;
local tempString;
local atIndex;
foreach locResource in locResourceArray do
begin
// Get the number of string sets.
numStringSets := ExtractWord(locResource, 0);
atIndex := 2;
// Grab each string set.
for stringSet := 1 to numStringSets do
begin
// Grab the C string that is the path.
pathExpr := ExtractCString(locResource, atIndex);
// Update index counter.
atIndex := atIndex + StrLen(pathExpr) + 1;
// Create path expression for following strings.
pathExpr := call Compile("'" & pathExpr) with ();
// Get the language strings and jam them.
// WARNING: This code will ignore zero-length strings.
// There are rare cases where you actually want an empty
// string for a particular translation; in this case, you
// could modify the code to throw an evt.ex.msg with the
// appropriate error.
foreach langSym in LanguagesSymArray do
begin
tempString := ExtractCString(locResource, atIndex);
if StrLen(tempString) > 0 then
langFrame.(langSym).(pathExpr) := tempString;
atIndex := atIndex + Length(tempString) + 1;
end;
end;
end;
CloseResFileX(r);
langFrame;
end;
Unlike
the text method, the resource method has to assume a certain number of base
languages. The first thing the code does is to check that there are exactly
five language symbols. If not, the code throws an exception. The result is a
typical Newton Toolkit error dialog with the string specified in the code.
In reality, we could be a bit more forgiving. The code won't create entries in
the languages array for items that are empty strings. So if a developer were
careful not to fill out entries for particular languages, the restriction could
be relaxed to no more than five languages. You could also make the code a bit
more complex and just not add strings for undefined languages. This is left as
an exercise for the masochistic reader.
An even better approach would be to create some other resource (say 'LOCi')
that contains information on how many languages are defined by the 'LOC#'
template and the language symbols. It would require slightly more complex code
for CreateLanguagesFrameFromRsrc, but it would provide more flexibility later
on. The CD contains modified code that uses an 'LOCi' resource.
As you can see, this is considerably more complex than the function used for
text files. Also note that this methodology can't use constants for common
strings. There are ways to massage the data to use constants, but that's left
as another exercise for the reader.
PUTTING IT ALL TOGETHER
Once you've created the languages frame, you can use SetLocalizationFrame and
LocObj in your project to localize your strings. The sample on this issue's CD
(Compile Time Strings) uses the code shown in Listing 4. This code is more
general than you may need, in that it creates the frame from either text files
or resources. The last line sets up a constant for the English (that is, the
default) language frame. You can use the constant English strings as part of
the first argument to LocObj.
The LocObj mechanism can be used with any object, not just strings. This
article looks only at strings, though the text-based method will work for most
types of objects.*
Listing 4. Calling SetLocalizationFrame
// Create the languages frame, either by text or by resource.
constant kFromText := nil;
// Create the kLanguagesArray constant for the languages.
// The text method requires only as many languages as there are
// text files; the resource method requires a 5-element array.
DefConst('kLanguagesArray,
call func(isText)
if isText then
'[English, French, German];
else
'[English, French, German, Other1, Other2]
with (kFromText));
if kFromText then
DefConst('kLangFrame,
CreateLanguagesFrameFromText(
HOME & "StringsCommon.f", kLanguagesArray));
else
DefConst('kLangFrame,
CreateLanguagesFrameFromRsrc(
HOME & "strings.rsrc", kLanguagesArray));
SetLocalizationFrame(kLangFrame);
// Define a constant for the English language frame.
constant kStrings := kLangFrame.English;
You're
probably wondering why we don't create a wrapper function to generate the
correct LocObj call. Unfortunately, LocObj is a special type of call in the
Newton Toolkit; it's evaluated as soon as the compiler hits it and it must
return a constant value.
CHANGING STRINGS AT RUN TIME
The LocObj mechanism is designed for compile-time customization of your
application. In other words, the LocObj function exists only in the
compile-time environment of the Newton Toolkit; you can use it only in places
that will be evaluated at compile time. In some circumstances you may want to
change localized strings at run time. One example would be a language
translator application where you want the interface strings to be displayed in
the current source language.
The raw data for the runtime strings exists in the languages frame. The frame
can be included in your package so that you have access to all the localized
strings. This will add a significant amount of space to your package; at worst,
it will take up two bytes per character in the unique strings, plus the storage
occupied by the symbols and frame structure.
You'll need to add some runtime support for switching language elements of the
interface. The main task is to decide what views need to be updated when a
language is switched. The simplest way to do this is to recursively propagate a
conditional message send through the application's view children:
// In application base view ...
myApp.PropagateLanguageChange := func()
begin
// ... conditionally recur through all the kids.
foreach child in :ChildViewFrames() do
// "x.y exists" only checks for y using proto inheritance.
if child.PropagateLanguageChange exists then
child:PropagateLanguageChange();
end;
This
code won't send to all children. To do that you would remove the
exists test
and just send the message, which will always be found since the top-level
parent defines it. If you make this change, you should add some sort of
conditional check for a message that does the real work of updating (like "if
child.DoLanguageChange exists then ...").
An alternative is to keep track of which views need updates. How you do it
depends on your application's structure. Typically, you would maintain an array
of the declared views that need updating. If the views that need updating are
well known, you're better off using the latter method.
Each view that requires an update will need to perform three tasks: change the
text based on the source language; usually change the viewBounds based on the
new text; and redraw or refresh based on the new viewBounds and text. Since
it's very likely that the viewBounds will change, most of the work can be done
in the viewSetupFormScript method of the view. Remember that redisplaying with
a new viewBounds requires sending a SyncView, which has the side effect of
sending all viewSetup messages.
This means that you can use the SyncView call as your message to indicate that
the source language has changed. When a view opens by normal means it will also
use the correct source language. Note that in some cases you may want to use
RedoChildren, which has the same basic effect as SyncView sent to all
children.
One caveat is that both SyncView and RedoChildren are expensive calls. You
should limit the places where the language can change. An example of runtime
customization (Run Time Strings) is provided on the CD.
READY TO ROCK AND ROLL
With the code from this article, you can now make all your applications world
ready. If you're just starting an application, take the time and use LocObj
where you should. If you already have a project, retrofit it. Then take the
code samples, customize them to your heart's content, and code away. Today
English, tomorrow the world.
Maurice Sharp is a truly multinational person. He was born in England,
naturalized to Canada, and now lives in California. He hopes to visit the
United States someday as well. His multinational background makes him a bit
psychotic when it comes to beer. He's never sure if he should order it warm or
cold, or just have water. This is why he prefers sake. Maurice is one of the
original members of Newton Developer Technical Support and is still there
(remember, we said he was a bit psychotic).*
Thanks to our technical reviewers Bob Ebert, Mike Engber, David Fedor, and
Martin Gannholm.*