December 93 - INTERNATIONAL NUMBER FORMATTING
INTERNATIONAL NUMBER FORMATTING
NORBERT LINDENBERG
Have you ever wondered how to get your program to display numbers in a way that
satisfies Macintosh users all around the world? This article tells you what users
expect and shows you how to use the Macintosh Toolbox to correctly format numbers,
taking the needs of both your program and the user into account. It also shows how to
interpret numbers entered by the user.
When you develop an application, you usually have some opinion about the format in which numbers
should be presented to the user. However, number formatting standards differ from country to
country (and sometimes even within a country), and users also may have their own ideas on the
subject. Macintosh system software provides support to format numbers in ways that accommodate
both the needs of your application and local standards, and -- starting with System 7.1 -- also lets
the user control some aspects of number formatting using the Numbers control panel.
This article shows two different ways to format numbers: using a default format for simple number
display, and following the user's specification for more sophisticated number display. It also shows
how to interpret numeric input correctly. This issue's CD contains an application called Numbers
Test that lets you try out these two different methods of formatting numbers and enter numbers for
interpretation. The CD also contains BuildNumbers, an MPW script that builds an MPW tool that's
functionally equivalent to the application.
WHAT USERS EXPECT
Users expect to see numbers in a format that makes sense to them. This challenges the programmer
to accommodate the variations on number formatting that occur around the world.
The most common system for writing numbers is the decimal system, where numbers are formed
from ten different numerals, with the position of each digit within a number defining a multiplier for
it: 123 = 1*100 + 2*10 + 3*1. However, there are many local variations on this scheme, and there are
some writing systems that prefer a different style of writing numbers, in which case decimal numbers
may or may not be an acceptable alternative. Systems besides the decimal system that users may
require include Roman numerals (used in many languages to number topics or title pages) and
hexadecimal numbers (familiar to everybody who's ever dropped into MacsBug), as well as the
Japanese and Chinese systems. For details on how these number formatting systems differ from one
another, see "Number Formatting Variations."
Computers may complicate matters even more by providing multiple character codes for the same
digit. For example, the Macintosh Japanese character set provides both 1-byte and 2-byte encodings
of the Latin characters (which are called "Romaji" in Japanese). They can easily be distinguished on
the screen: the 1-byte versions are narrower than the 2-byte versions, which take up the same widthas Kanji characters. For interpretation as numbers, however, these different encodings should be
considered equivalent.
Another example is the Macintosh Arabic character set, which defines a set of Arabic digits with
right-to-left orientation in addition to the ASCII digits, which have left-to-right orientation and are
usually displayed with Arabic glyphs when an Arabic font has been chosen. The right-to-left digits
are intended only for text that doesn't have a numeric meaning, such as software version strings and
part numbers, and are needed to obtain proper line layout in these cases. However, users may not be
aware of this intention and may try to enter numbers using these digits. Later we'll discuss how to
deal with this.
WHAT YOUR APPLICATION NEEDS
Depending on how sophisticated your application is with regard to numbers, you'll need to support
variations on number formatting in three different situations: simple number display, number display
in a user-specified format, and numeric input.
For simple number display, your application needs to show a given number in a default format that
makes sense to the user. This kind of formatting may suffice for many applications and is commonly
used for dialogs.
For other number-display situations, your application might need to format numbers according to
the user's specification. The user might specify which representation to use for the number (for
example, decimal or traditional Chinese; Thai, Arabic, or Latin glyphs), the number of digits after
the decimal separator, how to indicate negative numbers, whether to use thousands separators, which
currency symbol to use, and where to place it. This kind of formatting is needed, for example, for
spreadsheets, databases, and page layout applications.
Numeric input is needed in almost any application -- for example, to specify the width of a page, the
number of a page to jump to, or the size of a font. Ideally, your application should be able to
interpret a numeric string in any format that might make sense to the user, independent of the
display formats you use.
NUMBER FORMATTING VARIATIONS
Local variations on the decimal system include variations on the shapes of the digits, representation of
negative numbers, the decimal separator, and the thousands separator.
- The shapes of the digits: The glyphs used with the Latin writing system differ from those used with the
Arabic writing system, and several other writing systems come with their own glyphs.
Latin
Arabic
Thai
- Representation of negative numbers: The minus sign can be used before or after the number, or the
number can be parenthesized.
- The decimal separator: Either a period or a comma can be used to mark off the integer part of the
number from the fractional part.
- The thousands separator: A space, a comma, a period, or some other character can be used to mark off
the thousands place from the hundreds place, the millions place from the hundred thousands place, and
so on. Sometimes the thousands separator isn't used
at all.
Many other variations exist, especially for noninteger numbers. Here's a sample of local variations on how
one negative number is represented in the decimal system:
Arabic
French -1 234,56
German -1.234,56 Greek (1 234.56)
Japanese (1,234.56)
Swiss French -1'234.56
Thai
U.S. -1,234.56
In the Roman system, numbers are formed from letter digits representing the numbers shown below.
M | 1000
|
D | 500
|
C | 100
|
L | 50
|
X | 10
|
V | 5
|
I | 1
|
Originally the digits of the number were simply added up to arrive at the value of the number, and digits
were sorted in decreasing order within the number (so 9 = VIIII). Later a convention was added that
positioning one of the digits C, X, or I before a higher-valued digit means that its value is to be subtracted
instead of added (so 9 =IX).
The Japanese and Chinese systems represent numbers in various ways. In horizontal writing, the decimal
system with Latin glyphs is commonly used. Ten thousands separators were once used instead of thousands
separators and are still used in some very traditional quarters, but accountants in Japan now use thousands
separators instead. In the traditional vertical writing preferred by native speakers, however, Chinese
characters are used without separators. A mapping of decimal numbers to Chinese digits is acceptable;
however, a direct representation of the numbers as they are spoken is preferred. The number 45000, for
example, is represented in the decimal style on the left and in the traditional style on the right:
4
4
5
ten thousand
0
5
0
thousand
0
WHAT MACINTOSH SYSTEM SOFTWARE PROVIDES
Macintosh system software supports number formatting with international resources, the Numbers
control panel (in System 7.1), and the Text Utilities routines. Unfortunately, the functionality
provided doesn't cover all the needs just described -- it's limited to decimal numbers and a maximum
of two encodings per script. This means that, for instance, Chinese vertical numbers aren't
supported; with the advent of QuickDraw GX, which supports vertical text, this problem is becoming
more urgent. There are some interesting details you'll have to understand to make the best use of the
functionality provided.
INTERNATIONAL RESOURCES
International resources of two types, 'itl0' and 'itl4', provide data that helps in formatting numbers.
- Resources of type 'itl0' contain separator symbols (decimal separator and
thousands separator) and information about a simple default format. These
resources allow for 1-byte characters only and don't support more sophisticated
layout.
- Resources of type 'itl4' contain a number parts table used by the Text Utilities
routines to interpret format specification strings entered or selected by the user.
They also contain a table of alternate digits that can be used instead of the default
ASCII digits and that may be 2-byte characters. If there are no alternate digits for
the script, the ASCII digits are repeated in this table. A system file can contain multiple resources of either type. Each regional version of system software
comes with a default resource of each type, as well as the U.S. versions of the resources; more
resources can be added.
If multiple scripts are installed on one machine, each script has at least one resource of each type and
designates one resource of each type as the default for the script. The default resources for the
system script (the script that supports the language your system is localized for) define the
systemwide default. If GetIntlResource (IUGetIntl) is used to access a resource, the script whose
resources are returned depends on the font in the current graphics port and the settings of the
international resources selection flag. To avoid surprises, it's usually better to ask for resources of
specific scripts; the InitializeDefaultNumberSeparators routine, discussed later, does this.
All Macintosh scripts support the use of the ASCII digits ($30-$39), and some scripts provide an
additional set of digits in an alternate numeral table. The Japanese 'itl4' resource contains the 2-byte
Romaji digits; the Arabic 'itl4' resource, the right-to-left digits; and the Thai 'itl4' resource, the Thai
digits. Because only one alternate numeral table is allowed per 'itl4' resource, you won't find in the
Japanese 'itl4' resource the Chinese numerals used in the Japanese script. Unfortunately, not all
scripts that have multiple sets of digits define them in the 'itl4' resource; for instance, the Chinese
versions of System 7.1 don't make use of the alternate numeral table but only support the ASCII
digits.
THE NUMBERS CONTROL PANEL
The Numbers control panel (in System 7.1) lets users select the default number format and define
customized decimal and thousands separators, as well as the currency symbol. In earlier systems, the
International control panel (which was shipped only with certain localized versions of system
software) allowed the user to select the default number format but didn't provide for customization.
(See Figure 1.)
Figure 1The International and Numbers Control Panels
To correctly access the international resources and interpret their contents, it helps to know how the
control panels affect the resources. The behavior of the control panels has changed significantly from
system software versions 7.0 and 7.0.1 to version 7.1. The International control panel in versions 7.0
and 7.0.1 lets the user select only the language whose number formatting rules apply; it does not
allow modification of the rules. Selecting a language makes the corresponding region's 'itl0' resource
the default resource for its script, so that all its features take effect. The 'itl4' resources are not
affected.
The Numbers control panel in System 7.1 lets the user select a predefined regional version or define
a custom version of the number format. The first time the control panel is opened after installing
System 7.1, it creates a new 'itl0' resource in the System file based on the predefined default 'itl0'
resource of this version of system software and makes this new resource the default for the system
script. From then on it keeps the user's format definition in this personalized 'itl0' resource, whether
it's selected from predefined formats or defined as a custom format. When the user selects a different regional version, all items of that region's 'itl0' resource that are
represented in the control panel are copied into the personalized 'itl0' resource; other features
defined in the 'itl0' resource are ignored. This means that the decision, for example, whether to show
negative numbers with a minus sign or in parentheses is not affected by the selection. The default
selection of the 'itl4' resource isn't changed; however, the default 'itl4' resource is modified to use the
personalized 'itl0' resource's decimal and thousands separators in its number parts table.
There's one problem with the Numbers control panel that you have to be aware of:
it doesn't impose any constraints on the selections for the decimal and thousands separators, other
than not allowing the user to enter 2-byte characters. The user can, for example, select a digit, the
minus sign, no character at all, or a character that conflicts with the inner workings of the Text
Utilities routines for interpreting format specifications. To make sure that your application functions
correctly, you have to check whether the separators make sense before using them.
The sample code discussed in this article assumes that you don't check for changes of the resources
while your application is running, so it gets all necessary information at launch time and caches it.
This way, changes made with the control panel will not be immediately reflected in your application,
but you also avoid the problem of inconsistent updates. This problem can arise if you always use the
most current information, and the user changes, say, the decimal separator while your application is
displaying numbers in a window; in this case, it could happen that after redrawing a part of your
window you display one decimal separator in the updated region and another one in the rest of the
window.
THE TEXT UTILITIES ROUTINES
The Text Utilities routines format or interpret numbers according to format specifications that are
given by format strings and that can be quite sophisticated so that context-dependent variations can
be taken into account. Format strings are used in some spreadsheet and database applications and
look like this:
'###.###,##;(###.###,##);0.##'
By default, the Text Utilities routines assume that numbers are encoded by ASCII digits ($30-$39)
and displayed using Latin glyphs. However, we'll see that there's a way to have the routines support
the set of digits defined in the alternate numeral table in the 'itl4' resource, in addition to ASCII
digits.
The Text Utilities routines assume a localized format string. There's only limited support for
automatically adjusting a generic format string to local customs or the user's preferences: the routines
can replace characters by using a different number parts table but cannot convert to a different
structure of the number format. For example, the indicator of negative numbers can't be switched
automatically from the minus sign to parentheses. This is a problem if your application isn't localized
for all the regions supported by Macintosh system software.
Now you know what users expect, what your application's needs are, and what support Macintosh
system software offers when it comes to number formatting. You've seen that the Toolbox doesn't
provide a solution for all your needs, so you'll have to extend it in some cases. We'll now look at how
you can make the best use of whatis provided to do simple number display using a default format, to
display numbers in a user-specified format, and to interpret numeric input. The sample code
presented here uses the international resources either directly or in combination with the Text
Utilities routines to make up for at least some of the shortcomings of the Text Utilities routines.
SIMPLE NUMBER DISPLAY
First I'll show you how to use the 'itl0' resources in conjunction with the Text Utilities or Standard
Apple Numerics Environment (SANE) routines to display numbers in the default number format.
This method provides a simple solution for cases of simple number display.
The following code takes the localized or user-defined decimal and thousands separators into
account. It assumes that numbers are written as integer or fixed-point decimal numbers in the ASCIIcharacter set and displayed in a font of the system script. It doesn't support Roman numerals, full-
width Romaji, Chinese numbers, Thai digits, or the like. Negative numbers are written with a
leading minus sign; parentheses aren't supported.
In some cases you won't want to use the default number formatting definition but instead will want
to use the definition for a specific language. This case isn't taken into account in this version.
We start by defining the variables used to cache the default decimal and thousands separators. They
must be initialized by calling InitializeDefaultNumberSeparators when the application is launched.
PROGRAM Numbers;
USES Script, Resources, Memory, Errors, GestaltEqu, Packages, SANE,
UFailure;
VAR gDefaultDecimalSeparator: Char;
gDefaultThousandsSeparator: Char;
We call the procedure InitializeDefaultNumberSeparators in the application's initialization sequence
to initialize both gDefaultDecimalSeparator and gDefaultThousandsSeparator from the default 'itl0'
resource of the system script. If your application tracks changes in the Script Manager state, you can
reinitialize the variables by calling InitializeDefaultNumberSeparators again. Because the Numbers
control panel lets the user select any characters as the separators, we verify that the selection doesn't
conflict with our use of the separators. If no character was specified in the control panel, the 'itl0'
resource contains Char(0). It's OK not to have a thousands separator, but you can't display floating-
point numbers without a decimal separator. We don't use GetIntlResource (IUGetIntl), so the
outcome of this routine doesn't depend on the font in the current graphics port or the international
resources selection flag.
PROCEDURE InitializeDefaultNumberSeparators;
VAR theItl0Handle: Handle;
BEGIN
theItl0Handle := GetResource('itl0', GetScript(smSystemScript,
smScriptNumber));
FailNILResource(theItl0Handle);
WITH Intl0Hndl(theItl0Handle)^^ DO BEGIN
IF (decimalPt IN ['0'..'9', Char(0), '-']) OR (thousSep IN
['0'..'9', '-']) OR (decimalPt = thousSep) THEN
FailOSErr(paramErr);
gDefaultDecimalSeparator := decimalPt;
gDefaultThousandsSeparator := thousSep;
END;
END;
The FailNILResource, FailNIL, FailOSErr, and FailResError routines check for errors and initiate
error handling if necessary; they were originally introduced in MacApp. In this sample code, I don't
provide complete error handling, but only call these routines to indicate where a real application
would have to be prepared to handle errors.
The procedure LocalizeNumberString takes a string representing a number as it's produced by a
nonlocalizable conversion routine and localizes it by adjusting the decimal separator (if there is one)
and inserting thousands separators.
PROCEDURE LocalizeNumberString(VAR theString: Str255);
VAR boundary: Integer;
separatorString: String[1];
minusOffset: Integer;
BEGIN
separatorString := ',';
separatorString[1] := gDefaultThousandsSeparator;
First, we find the boundary between the integer and fractional parts. If there's a period, that's the
boundary (and we fix the decimal separator right away); otherwise it's the end of the string.
boundary := Pos('.', theString);
IF boundary <> 0 THEN
theString[boundary] := gDefaultDecimalSeparator
ELSE
boundary := Length(theString) + 1;
Second, we insert as many thousands separators as necessary, if the user has specified one. We take
into account that we don't want to insert a thousands separator right after a minus sign.
IF gDefaultThousandsSeparator <> Char(0) THEN BEGIN
IF theString[1] = '-' THEN
minusOffset := 1
ELSE
minusOffset := 0;
WHILE boundary > 4 + minusOffset DO BEGIN
theString := Concat(Copy(theString, 1, boundary - 4),
separatorString,
Copy(theString, boundary - 3,
Length(theString) - boundary + 4));
boundary := boundary - 3;
END;
END;
END;
And now we finally come to the two routines that an application will call directly to format numbers
into strings. The first one is intended for integer numbers, the second one for floating-point
numbers.
IntegerToLocalString converts the given integer into a string representation using the thousands
separator specified by localization or by the user. It calls NumToString, a Text Utilities routine.
PROCEDURE IntegerToLocalString(theNumber: LongInt;
VAR theString: Str255);
BEGIN
NumToString(theNumber, theString);
LocalizeNumberString(theString);
END;
ExtendedToLocalString converts the number into a fixed-point representation using the decimal
separator specified by localization or by the user. The number of digits to be used after the decimal
separator is specified in decimalDigits. DecForm and Num2Str are defined by SANE.
PROCEDURE ExtendedToLocalString(theNumber: Extended;
decimalDigits: Integer; VAR theString: Str255);
VAR theDecForm: DecForm;
BEGIN
WITH theDecForm DO BEGIN
style := fixedDecimal;
digits := decimalDigits;
END;
Num2Str(theDecForm, theNumber, DecStr(theString));
LocalizeNumberString(theString);
END;
That's all there is to the simple case.
NUMBER DISPLAY IN A USER-SPECIFIED FORMAT
Now I'll show you how to use the Text Utilities routines and 'itl4' resources to format numbers
according to the user's specification. The idea is that your application comes with a range of
predefined format strings, from which the user can pick one. The application might also let users
enter their own format strings. Of course, these strings aren't exactly the most user-friendly way to
define a number format, so if your application is intended for novice users you should hide them
behind a friendlier user interface. Before we dive into the code, let's look at a few obstacles that the
Text Utilities routines provide for us and consider how we can work around them.
First of all, the Text Utilities routines expect to work with localized format specifications. They
aren't able to take, for example, the standard number format used in the United States and translate
it into the standard number format used in Greece, which uses parentheses to indicate negative
numbers. This will be a problem if your application doesn't get localized for all regions for which
Macintosh system software is localized and if some versions of your application get used in regions
for which they aren't localized. To work around this problem, the range of format strings that a
given version of the software offers should include all formats commonly used in any of the regions
in which this version might be used, and your application should also let users enter their own format
strings.
Second, the format strings are interpreted with reference to the characters defined by the number
parts table that you pass into the Text Utilities StringToFormatRec routine. You have to be sure to
use a number parts table whose characters are compatible with the strings you provide. Currently,
only the characters defined by the U.S. 'itl4' resource are documented. To deal with this situation,
we'll take advantage of the fact that the U.S. 'itl4' resource is always available and will use its
characters as a stable reference point. We'll define all format strings using the U.S. characters, and
use the number parts table in the U.S. 'itl4' resource to interpret them.
Third, if you match a format string against the number parts table of the default 'itl4' resource, you'll
have to make sure that your application doesn't break if the user defines a custom number format and
the Numbers control panel patches the new decimal and thousands separators into the 'itl4' resource.
Therefore, we'll have to undo all changes that the user may have made with the control panel before
we can use an 'itl4' resource to interpret our format strings.
Unfortunately, there's no guarantee that other parts of the number parts table won't be modified by
the Numbers control panel in the future. This means that a format string that can be converted
under the current mechanism may become unconvertible in the future, just as a valid 7.0 format
string may no longer be recognized by the 7.1 StringToFormatRec routine. There's not much you
can do about this until you know it's happened.
An alternative approach that avoids this problem is to store internal representations of format
specifications instead of format strings in the application's resources. The internal representations are
created by a separate tool that's run on unmodified U.S. system software during the development
process. This approach, however, makes it more difficult for localizers to look at the format strings
and to create new ones, and also makes it slightly more difficult to use an additional feature that we'll
discuss later, in the "Converting Format Strings" section. I therefore prefer to keep format strings in
the application and convert them at run time.
So, to display numbers in a user-specified format, we do the following:
- Define all format strings using the characters given by the default U.S. 'itl4'
resource and documented inInside Macintosh: Text, pages 5-39 to 5-43.
- Set up two separate number parts tables: one "reference" table that will be used to
interpret the predefined format strings and one "user" table that reflects the user's
formatting needs.
- Using the reference number parts table, convert the predefined format strings into
the Text Utilities' internal numeric representation.
- Using the user number parts table, convert the internal numeric representation
into format strings that can be displayed to users (to let them select a preferred
format, for example).
- Using the user number parts table, convert a format string that the user has
entered into an internal numeric representation.
- Format numbers using the user number parts table.
Inside Macintosh: Text , pages 5-35 to 5-44, goes into great detail about how to format numbers
according to the user's specifications. The approach I describe here differs somewhat from that
approach. Instead of assuming that all format strings are localized for the language supported by the
default 'itl4' resource, we prepare the application to support format strings from two different
sources, the application resources and the user, by using separate number parts tables for them. This
is shown in Figure 2, which essentially replaces the upper left portion of the data flow diagram on
page 5-37 of Inside Macintosh: Text. In all other places where the diagram inInside Macintosh: Text shows a number parts table, we use the user number parts table.
Figure 2 Number Formatting Using Two Different Number Parts Tables
DEFINING FORMAT STRINGS
Which format strings you provide with your application depends on the countries you want to target
and the specific needs of your users. Here are some sample strings that you may want to use:
* '###,###.##;-###,###.##;0.##' can be used for floating-point numbers with an
absolute value of less than one million, with a thousands separator and the minus
sign for negative numbers, and without padding.
* '###,###.##;(###,###.##);0.##' is similar, but with parentheses to represent
negative numbers, as is customary in some countries.
* '+^^^;-^^^;^^^^' can be used for integer numbers with an absolute value of less
than one thousand, with signs for both positive and negative numbers, and with
padding to four places with a space character as may be necessary for alignment.
When you define the format strings, there are a few things you have to watch out for. Most
important, the predefined format strings shouldn't include any literal text, as this text is almost
guaranteed to be inappropriate for the regions the application isn't localized for. (Note that it may be
inappropriate even for the target region; for example, you shouldn't embed currency symbols,
because many users deal with foreign currency.)
Also, conversion from U.S. to localized characters really only works for characters that are listed as
separate tokens inInside Macintosh: Text , page B-56, not for unquoted characters or other text. For
example, parentheses are unquoted characters and don't get converted to the special right-to-leftparentheses used by the Arabic and Hebrew script systems. As a result, neither a format string that
contains parentheses nor numbers formatted with it display correctly on Arabic, Persian, or Hebrew
system software. To avoid problems like this, make sure to test your software on the system software
for all countries that you want to target.
Finally, the format string specifies the maximum number of predecimal digits in a formatted number,
and the formatting routine will report an error if the number doesn't fit into the format. This means
that your program has to ensure that the format strings have enough predecimal digits to
accommodate all numbers that
may need to be formatted. If the predefined strings you need get too long, you
may want to use a simplified version that you can show to users without scaring
them.
SETTING UP THE NUMBER PARTS TABLES
Now we'll set up the number parts tables that we'll use. The reference table is based on the U.S. 'itl4'
resource, but we'll undo all changes that the user may have made with the control panel. The user
table is based on the system's default number parts table and the user's selections in the Numbers or
International control panel.
Again, we assume that your application doesn't check for changes in the Script Manager state, and
therefore we cache the number parts tables that we need at launch time. The tables are initialized by
calling InitializeNumberPartsTables. If the user number parts table is an unmodified U.S. table, only
one table is allocated, and both pointers reference this table.
VAR gUserNumberPartsTable: NumberPartsPtr;
gReferenceNumberPartsTable: NumberPartsPtr;
VAR gSystemVersion: LongInt;
The function GetUserItl4 gets the 'itl4' resource that matches the user's selection in the Numbers or
International control panel. This isn't necessarily the default 'itl4' resource.
FUNCTION GetUserItl4: Handle;
VAR theItl4Handle: Handle;
systemScript: ScriptCode;
tableOffset, tableLength: LongInt;
theItl0Handle: Handle;
theResID: Integer;
theResType: ResType;
theResName: Str255;
System 7 provides a new routine, GetIntlResourceTable (IUGetItlTable), that returns the number
parts table of the default 'itl4' resource. However, the effect of the International control panel on the
default 'itl4' resource differs from that of the Numbers control panel, as explained earlier, and we
take some extra steps to achieve the behavior that best matches the control panel's behavior. The
International control panel selects an 'itl0' resource but doesn't affect the 'itl4' resource. If we
continued using the default 'itl4' resource, the user wouldn't see any effect from the control panel
selection. To make up for this, we'll try to find an 'itl4' resource that matches the 'itl0' that the user
selected, and use it instead of the default 'itl4'. The Numbers control panel, on the other hand,
updates the decimal and thousands separators in the 'itl4' resource, and changes in the default 'itl0'
resource are limited to the features visible in the control panel. Therefore, the best solution in this
case is to use the default 'itl4' resource.
BEGIN
IF gSystemVersion >= $0710 THEN BEGIN
systemScript := GetEnvirons(smSysScript);
IUGetItlTable(systemScript, iuNumberPartsTable, theItl4Handle,
tableOffset, tableLength);
FailNILResource(theItl4Handle);
END
ELSE BEGIN
The workaround used here is to ask the system for the 'itl0' resource and then try to find an 'itl4'
resource with a matching number. Some countries, however, have multiple 'itl0' resources and only
one 'itl4' resource, or they don't have any 'itl4' resource of their own (they use the U.S. version). To
cover these cases, we have to go through an exception table.
theItl0Handle := IUGetIntl(0);
FailNILResource(theItl0Handle);
GetResInfo(theItl0Handle, theResID, theResType, theResName);
theItl4Handle := GetResource('itl4', theResID);
IF ResError = resNotFound THEN BEGIN
The exceptions in system software versions 7.0 and 7.0.1 are as follows:
- the Netherlands: 'itl0' IDs 5 and 6; 'itl4' only ID 5.
- Czechoslovakia: 'itl0' IDs 30776, 30777, 56, 57; 'itl4' only ID 30776.
Note that 'itl0' 56 occurs in both the Czechoslovakian and Polish versions. For all other cases, we try
the U.S. resource.
CASE theResID OF
6: theResID := 5; { Netherlands }
30777, 56, 57: theResID := 30776; { Czechoslovakia }
OTHERWISE theResID := verUS;
END;
theItl4Handle := GetResource('itl4', theResID);
FailNILResource(theItl4Handle);
END
ELSE
FailNILResource(theItl4Handle);
END;
GetUserItl4 := theItl4Handle;
END;
In the procedure InitializeDefaultNumberSeparators we've verified that the characters the user has
specified as decimal and thousands separators don't conflict with the use of these separators for
default formatting. Here, we have to take one additional step: the Text Utilities routines for user-
specified formatting don't work if the same character is used for different purposes in the number
parts table. For instance, a character can't be used both as the decimal separator and to represent
digits in a format string. If the routines find a number parts table they don't like, they return the
fBadPartsTable result. The procedure CheckDefaultNumberSeparators uses this to check for
problems in the user number parts table (which contains the default separators) at application
initialization time. In real life, your application should produce a more meaningful message
explaining to the user what went wrong and then should quit.
PROCEDURE CheckDefaultNumberSeparators(userNumberPartsTable:
NumberPartsPtr);
CONST testString = '0';
VAR formatRecord: NumFormatString;
result: FormatStatus;
BEGIN
result := Str2Format(testString, userNumberPartsTable^,
formatRecord);
IF FormatResultType(result) <> fFormatOK THEN
FailOSErr(paramErr);
END;
The procedure ExtractNumberPartsTable is used by the InitializeNumberPartsTables routine to
extract a number parts table from an 'itl4' resource.
FUNCTION ExtractNumberPartsTable(theItl4Handle: Handle):
NumberPartsPtr;
VAR tableOffset: LongInt;
tableLength: LongInt;
theTable: Ptr;
BEGIN
WITH NItl4Handle(theItl4Handle)^^ DO BEGIN
tableOffset := defPartsOffset;
tableLength := defPartsLength;
END;
theTable := NewPtr(tableLength);
FailNIL(theTable);
LoadResource(theItl4Handle);
{ Might have been purged since we got hold of it }
FailResError;
BlockMove(Ptr(LongInt(theItl4Handle^) + tableOffset), theTable,
tableLength);
ExtractNumberPartsTable := NumberPartsPtr(theTable);
END;
The procedure InitializeNumberPartsTables initializes gUserNumberPartsTable and
gReferenceNumberPartsTable by copying the tables from the respective 'itl4' resources into
nonrelocatable blocks in the heap and cleaning the reference table if necessary.
PROCEDURE InitializeNumberPartsTables;
VAR userItl4, usItl4: Handle;
BEGIN
userItl4 := GetUserItl4;
usItl4 := GetResource('itl4', verUS);
FailNILResource(usItl4);
gUserNumberPartsTable := ExtractNumberPartsTable(userItl4);
CheckDefaultNumberSeparators(gUserNumberPartsTable);
We check whether the user number parts table is an unmodified U.S. table, so we can use it as the
reference table as well.
IF (usItl4 = userItl4) AND ((gSystemVersion < $0710) OR
((gDefaultDecimalSeparator = '.') AND
(gDefaultThousandsSeparator = ','))) THEN
gReferenceNumberPartsTable := gUserNumberPartsTable
ELSE BEGIN
We need to get the U.S. number parts table and undo any changes that the user may have made with
the Numbers control panel.
gReferenceNumberPartsTable := ExtractNumberPartsTable(usItl4);
gReferenceNumberPartsTable^.data[tokDecPoint].a[1] := '.';
gReferenceNumberPartsTable^.data[tokThousands].a[1] := ',';
END;
END;
The procedure DisposeNumberPartsTables disposes of the global number parts tables.
PROCEDURE DisposeNumberPartsTables;
BEGIN
IF gReferenceNumberPartsTable <> gUserNumberPartsTable THEN
DisposPtr(Ptr(gReferenceNumberPartsTable));
DisposPtr(Ptr(gUserNumberPartsTable));
gReferenceNumberPartsTable := NIL;
gUserNumberPartsTable := NIL;
END;
CONVERTING FORMAT STRINGS
Now that we have the two number parts tables, we're going to use them to do some conversions.
We're going to convert our predefined format strings into the Text Utilities' internal numeric
representation, convert this representation into format strings that can be displayed to the user, and
convert format strings entered by the user into internal representations.
But first, remember the alternate numerals table mentioned earlier? It's time now to reveal a
previously undocumented feature: if a number parts table used for converting a format string to its
internal numeric representation contains a character other than "#" as the no-leader format marker,
the resulting internal numeric representation will specify using the alternate numerals.
We can use this knowledge to write a wrapper around the StringToFormatRec (Str2Format) routine
that temporarily replaces the no-leader format marker, adjusts the format string to use the
replacement character as well, calls StringToFormatRec, and reverts the number parts table to its
original state. A convenient replacement character is "1," because digits are very unlikely to be used
for any other purpose in any version of the number parts table. As the character code for "#" is in a
range that isn't used for the bytes of 2-byte characters, we don't have to check for 2-byte characters
here.
Here's what the wrapper routine looks like:
PROCEDURE StringToFormatRecord(formatString: Str255;
useAlternateNumerals: Boolean;
theNumberPartsTable: NumberPartsPtr;
VAR formatRecord: NumFormatString);
VAR result: FormatStatus;
oldChar: WideChar;
i: Integer;
BEGIN
IF useAlternateNumerals THEN BEGIN
oldChar := theNumberPartsTable^.data[tokNonLeader];
theNumberPartsTable^.data[tokNonLeader].b := Ord('1');
FOR i := 1 TO Length(formatString) DO
IF formatString[i] = '#' THEN
formatString[i] := '1';
END;
result := Str2Format(formatString, theNumberPartsTable^,
formatRecord);
IF useAlternateNumerals THEN
theNumberPartsTable^.data[tokNonLeader] := oldChar;
IF FormatResultType(result) <> fFormatOK THEN
FailOSErr(paramErr);
END;
When do you use alternate numerals? First you have to find out whether the user 'itl4' resource
you're using supports alternate numerals. You can use the following routine to do this. If it returns
TRUE, you should let the user make the final decision whether to use the alternate numerals for
output -- you can't take for granted that they're always preferred over the ASCII digits. For input, it
probably makes sense to accept them without bothering the user first. An exception is the alternate
numerals in bidirectional scripts, where the internal representation of the number won't match what
the user sees on the screen. You probably shouldn't accept these digits.
FUNCTION HasAlternateNumerals
(aNumberPartsTable: NumberPartsPtr): Boolean;
BEGIN
HasAlternateNumerals :=
aNumberPartsTable^.altNumTable.data[0].b <> Ord('0');
END;
Obviously, the decision whether to use alternate numerals has to be made at run time. This is the
second reason I recommended storing format strings and not internal representations in the
application: with internal representations, you would have to store both versions and select the right
one at run time; with format strings, you only store one version and decide at run time how to
convert it.
Given this preparation and the two number parts tables, the remaining steps are straightforward. The
following routines do no more than call StringToFormatRecord and FormatRecToString with the
appropriate number parts table and check for errors that might occur.
PredefinedStringToFormatRecord converts a predefined format string using the standard U.S.
number parts table into an internal numeric representation.
PROCEDURE PredefinedStringToFormatRecord
(predefinedFormatString: Str255;
useAlternateNumerals: Boolean;
VAR formatRecord: NumFormatString);
BEGIN
StringToFormatRecord(predefinedFormatString,
useAlternateNumerals,
gReferenceNumberPartsTable, formatRecord);
END;
FormatRecordToUserString converts an internal numeric representation into a format string that
can be displayed to the user.
PROCEDURE FormatRecordToUserString(formatRecord: NumFormatString;
VAR userFormatString: Str255);
VAR result: FormatStatus;
positions: TripleInt;
BEGIN
result := Format2Str(formatRecord, gUserNumberPartsTable^,
userFormatString, positions);
IF FormatResultType(result) <> fFormatOK THEN
FailOSErr(paramErr);
END;
UserStringToFormatRecord converts a format string entered by the user into an internal numeric
representation.
PROCEDURE UserStringToFormatRecord(userFormatString: Str255;
useAlternateNumerals: Boolean;
VAR formatRecord: NumFormatString);
BEGIN
StringToFormatRecord(userFormatString, useAlternateNumerals,
gUserNumberPartsTable, formatRecord);
END;
FORMATTING NUMBERS
After all the preparations, the formatting itself is trivial. FormatNumber formats theNumber into a
string, using the internal numeric representation given and the user number parts table.
PROCEDURE FormatNumber(theNumber: Extended;
theFormatRecord: NumFormatString;
VAR theString: Str255);
VAR result: FormatStatus;
BEGIN
result := FormatX2Str(theNumber, theFormatRecord,
gUserNumberPartsTable^, theString);
IF FormatResultType(result) <> fFormatOK THEN
FailOSErr(paramErr);
END;
NUMERIC INPUT
Now let's look at conversions in the opposite direction. When the user enters a number, your
application receives a numeric string that it has to convert into a number. This could be quite a
difficult task, given that a user may pick a rather arbitrary format (and remember, Macintosh users
are generally inclined to do things their own way). Unfortunately, the Toolbox doesn't provide a
routine that simply converts an arbitrary numeric string to a number; your application always has to
specify the acceptable format.
We can reasonably make some simplifying assumptions: If your application doesn't support output in
formats other than ASCII digits, it's probably acceptable to apply the same restriction to the input
formats that can be used. And if your application uses only the default format for display, you can
also get away with allowing input in this format only, although it would be nicer to accept input in
any format the user prefers. If your application supports user-specified number formats, however, it
should be prepared to accept input in any currently defined format, and probably in some variations
of them, plus the default format.
The Toolbox provides three routines that convert numeric strings into numbers. StringToNum can
parse integer numbers but can't deal with anything that goes beyond a sequence of decimal digits
that's possibly preceded by a sign. The SANE routine Str2Num can parse floating-point numbers
and does detect erroneous input, but it assumes the period as the decimal separator and doesn't
support thousands separators. Finally, StringToExtended is supposed to support input of numeric
strings in a localized format, but it can deal with only one format at a time and has several other
shortcomings that we'll look at later.
We'll take the same approach as with number display and provide separate routines that can deal
with the default number format and with user-specified formats.
INTERPRETING NUMBERS IN THE DEFAULT NUMBER FORMAT
This solution restricts user input to something that's very close to the default number format. Then
we can simply reverse the "localization" process done for the default format -- that is, strip
thousands separators and modify the decimal separator -- and call the StringToNum routine or
SANE directly to interpret the number.
The procedure UnlocalizeNumberString checks whether theString is a legal number string, strips all
thousands separators, and replaces the default decimal separator with a period. As the default number
format consists only of ASCII digits, the minus sign, and thousands and decimal separators (which
are limited to one byte by the 'itl0' resource), we don't have to worry about 2-byte characters here if
we walk the string and check each character (we would have to worry if we used a string search
routine).
PROCEDURE UnlocalizeNumberString(VAR theString: Str255;
allowDecimal: Boolean);
VAR delta: Integer;
i: Integer;
theChar: Char;
BEGIN
delta := 0;
FOR i := 1 TO Length(theString) DO BEGIN
theChar := theString[i];
IF (theChar >= '0') & (theChar <= '9') THEN
theString[i - delta] := theChar
ELSE IF (theChar = '-') & (i = 1) THEN
theString[i - delta] := theChar
ELSE IF theChar = gDefaultThousandsSeparator THEN
delta := delta + 1
ELSE IF theChar = gDefaultDecimalSeparator THEN BEGIN
IF allowDecimal THEN BEGIN
allowDecimal := FALSE; { one is enough }
theString[i - delta] := '.';
END
ELSE
FailOSErr(paramErr)
END
ELSE
FailOSErr(paramErr);
END;
theString[0] := Char(Length(theString) - delta);
IF Length(theString) = 0 THEN
FailOSErr(paramErr);
END;
And here we finally come to the two routines that your application will call directly to interpret
numeric strings. The first one is intended for integer numbers, the other one for floating-point
numbers.
LocalStringToInteger converts to an integer a numeric string that we're hoping represents an
integer. The string may contain localized thousands separators.
PROCEDURE LocalStringToInteger(theString: Str255;
VAR theNumber: LongInt);
BEGIN
UnlocalizeNumberString(theString, FALSE);
StringToNum(theString, theNumber);
END;
LocalStringToExtended converts to an equivalent floating-point number a numeric string that we're
hoping represents a fixed-point number. The string may contain localized thousands and decimal
separators.
PROCEDURE LocalStringToExtended(theString: Str255;
VAR theNumber: Extended);
BEGIN
UnlocalizeNumberString(theString, TRUE);
theNumber := Str2Num(theString);
END;
INTERPRETING NUMBERS IN USER-SPECIFIED NUMBER FORMATS
Now we can benefit from some of the work that we did to display a number in a user-specified
number format: we use the same number parts tables and the same format conversion routines.
However, because we can't really constrain users to any given format for input, we have to allow
them at least to use any currently defined format, including the default format. To do this, your
application should try the default format and all currently defined format strings until it finds one for
which the conversion is successful.
The core of this conversion is the StringToExtended (FormatStr2X) routine, and it helps to
understand how this routine works. You get the most reliable results from StringToExtended if users
enter numbers exactly in one of the three possible formats given by a format string (for positive and
negative numbers and 0). In this case it returns fFormatOK. If the input string doesn't conform to
any of the three formats, StringToExtended tries to guess: it replaces the default decimal separator in
the
string with a period, moves the minus sign to the beginning of the string, and strips some other
punctuation. If the resulting string can be interpreted by SANE, StringToExtended returns
fBestGuess and the number found by SANE; otherwise StringToExtended returns one of several
other values that indicate why the string cannot be interpreted.
You can control how liberal your application is by either allowing fBestGuess as a result of
StringToExtended or treating it as an error. If you don't allow fBestGuess, the input string can
deviate from the specified format in only the following two ways:
- If you specify parentheses to indicate negative numbers, StringToExtended
recognizes parenthesized numbers as negative but still also accepts numbers with a
minus sign. This helps in many countries where negative numbers can be written
with either a minus sign or parentheses.
- If you set useAlternateNumerals to TRUE when you converted your format
string, StringToExtended accepts ASCII digits in addition to the alternate
numerals.
If you allow fBestGuess, the following deviations (and more) are also allowed:
- The minus sign for negative numbers can occur anywhere in the number. That
may be nice for some people who prefer to write the minus at the end, but
interpreting "12-3" as -123 doesn't make sense.
- Thousands separators can be missing or in the wrong place.
- A fractional part can be present while the format string doesn't provide for it.
- A decimal separator and fractional part specified in the format string can be
missing.
- More predecimal digits than specified by the format can be present.
Because the guessing process ignores some of the information entered by the user, it may behave
inconsistently with the behavior shown when the format is OK. For example, if your format string
uses parentheses to indicate negative numbers and also specifies thousands separators, "(123)" will be
read as -123 (with fFormatOK), but "(1234)" will be read as 1234 (with fBestGuess). The missing
thousands separator causes StringToExtended to go into the guessing pass, and there the parentheses
are stripped instead of being interpreted as the indicator for "negative."
This situation is somewhat unfortunate. On one hand, checking for fFormatOK alone doesn't give
users much freedom to type numbers in their preferred format -- for instance, where a fractional
part is specified, they must at least enter the decimal separator. On the other hand, allowing
fBestGuess may mean that some pretty bizarre strings are accepted and may lead to inconsistent
interpretation of parentheses. It seems that for integer strings allowing fBestGuess is never a good
solution, while for real values it might be OK as long as you don't support expressing negative
numbers with parentheses. To play it safe, I don't allow fBestGuess in the code that follows.
To provide the appropriate handling and return the desired type of number, here are two different
routines that interpret numeric strings representing floating-point and integer numbers. They both
return Booleans to tell the caller whether theString was successfully converted.
InterpretExtended interprets a numeric string and converts it to an Extended number.
FUNCTION InterpretExtended(theString: Str255;
theFormatRecord: NumFormatString;
VAR theNumber: Extended): Boolean;
VAR result: FormatStatus;
BEGIN
result := FormatStr2X(theString, theFormatRecord,
gUserNumberPartsTable^, theNumber);
InterpretExtended := FormatResultType(result) = fFormatOK;
END;
InterpretInteger interprets a numeric string that (we hope) represents an integer and converts it to
the equivalent LongInt.
FUNCTION InterpretInteger(theString: Str255;
theFormatRecord: NumFormatString;
VAR theNumber: LongInt): Boolean;
VAR result: FormatStatus;
theExtended: Extended;
CONST
minLongInt = -2147483648;
maxLongInt = 2147483647;
BEGIN
result := FormatStr2X(theString, theFormatRecord,
gUserNumberPartsTable^, theExtended);
IF (FormatResultType(result) = fFormatOK) &
(theExtended >= minLongInt) &
(theExtended <= maxLongint) THEN BEGIN
theNumber := Num2LongInt(theExtended);
InterpretInteger := TRUE;
END
ELSE
InterpretInteger := FALSE;
END;
TESTING THE CODE
The Numbers Test application on this issue's CD tests all the routines that we've seen so far. It's a
simple, text-based application that spits out a few numbers in the default format, converts a few
format strings and displays them in the localized version, and then runs a test that reads in numbers
and formats them according to format strings. In a real program, you would of course get the format
strings and the messages from resources. In case you prefer to play with an MPW tool, you can use
the BuildNumbers script on the CD to build a tool that's equivalent to the application.
One final warning in case you don't check out the source code: Before using the routines described in
this article, your application has to check that it runs on system software version 7.0 or higher,
because the section "Setting Up the Number Parts Tables" makes some assumptions about this.
OFF YOU GO
If you compare the user expectations described at the beginning of this article and what we've
actually achieved with the Toolbox, you'll find that we've been only partially successful. The code
handles the default number format and some user-specified customizations, but it's still limited to
decimal numbers expressed in ASCII digits and to additional characters supported by the 'itl4'
resource.
You may consider adding code of your own to make your number handling more flexible. There's an
obvious need for support for nondecimal numbers. Many specialized applications already support
Roman numerals or hexadecimal numbers; your application could similarly support traditional
Chinese numbers or decimal numbers using Chinese digits.
When you add your own number support, you should always design it as an extension to the facilities
provided by system software, not as a replacement. The Macintosh is being localized into ever more
languages to reach ever more customers, and you'll be more successful if your application is at least as
good as system software worldwide.
REFERENCES
- Apple Numerics Manual , 2nd ed. (Addison-Wesley, 1988).
- Inside Macintosh: Text (Addison-Wesley, 1993). See Chapter 1, "Introduction to Text on the Macintosh";
Chapter 5, "Text Utilities"; Chapter 6, "Script Manager"; and Appendix B, "International Resources."
NORBERT LINDENBERG is an import from Germany who still wonders why there are offices in California that don't have
windows and need artificial light while the sun is shining outside, and how Americans can survive on 12 days of vacation
a year instead of the normal six weeks. While a student at the University of Karlsruhe, Norbert developed the Literate
Programming Workshop, which he claims is the best environment for doing literate programming. (Literate programming is
similar to a develop article in that you combine source code and documentation in one document, but different because
you compile the source code right out of this integrated source document.) Now he actually gets paid for writing literate
programs for Apple's International Software Support Group. *For background information on number formatting, see Chapter 5, "Text Utilities," in Inside Macintosh: Text . For definitions
of basic terms used in this article, see Chapter 1, "Introduction to Text on the Macintosh," in the same volume. *
The number in international resource types isn't related to functionality. Resources of type 'itl1' contain long date strings;
resources of type 'itl2' deal with text handling (sorting, uppercase and lowercase, word boundaries); and resources of
type 'itl5' define rendering and character encoding. There are no resources of type 'itl3'. *
GetIntlResource or IUGetIntl? With the new edition of Inside Macintosh , many Toolbox routines have been renamed.
However, interface files defining the new names are not yet available for all Macintosh programming environments.
Therefore, I use the new names in the documentation, followed by the old names in parentheses, and I use the old names
in the source code.*
GetIntlResource (IUGetIntl) is described in Inside Macintosh: Text , pages 6-90 to 6-91. The international resources selection
flag is described in Inside Macintosh: Text , pages 6-21 to 6-26.*
SANE is the set of routines for floating-point calculations in Apple computers. It's documented in the Apple Numerics
Manual .*
NumToString is described in Inside Macintosh: Text , page 5-92.*
DecForm and Num2Str are described in the Apple Numerics Manual , second edition, pages 26 to 27. *
Number parts tables are described in more detail in Inside Macintosh: Text , pages B-55 to
B-57.*
GetIntlResourceTable (IUGetItlTable) is described in Inside Macintosh: Text , pages 6-91 to 6-92.*
StringToFormatRec (Str2Format) is described in Inside Macintosh: Text , pages 5-95 to 5-96.*
FormatRecToString (Format2Str) is described in Inside Macintosh: Text , pages 5-96 to 5-98.*
StringToNum and StringToExtended (FormatStr2X) are described in Inside Macintosh: Text . See pages 5-93 to 5-94 for
StringToNum, and pages 5-98 to 5-99 for StringToExtended.*
Str2Num is described in the Apple Numerics Manual , second edition, pages 25 to 26. *
Result values for StringToExtended are described in Inside Macintosh: Text , page 5-38.*
Num2LongInt is described in the Apple Numerics Manual, second edition, page 22.*
THANKS TO OUR TECHNICAL REVIEWERS Jeannette Cheng, Peter Edberg, Bryan K. ("Beaker") Ressler, Kenny Tung *