December 94 - Newton Q & A: Ask the Llama
Newton Q & A: Ask the Llama
Newton Developer Technical Support
Q I could really use some help with speeding up my Newton application. Have you got any tips on
performance?
A You're not the only one who wants this; my llama senses have recently been overwhelmed by a
call for information on performance. All the questions in this issue's column will relate to
performance in some way. Take a look and see if there's something here that will help you.
There are two important points to remember:
- None of these tips will work by themselves; you must measure your code. Use Ticks, use the
trace global (see below), use Print. Find out where your code is slow, or where your
application is bloated.
- There is no silver bullet for a problem; you must experiment with different solutions.
In the words of my wise programming master: "When is a llama not a llama? . . . When it is a
guanacos." Or, "When you can snatch these coconuts from my hand, then it will be time for me
to leave."
Q I'm building an application that has a large set of static data. I search on a key term (a string) and get
all the data associated with that string. Mike Engber's "Lost In Space" article (in the May 1994 issue of
PIE Developers magazine) says that I should include this data in my package and things will be fast. But
this doesn't seem to be the case. I have thousands of frames of data. Each frame contains one or more slots
with strings that contain the key terms. I use FindStringInFrame to find all references to a key term but
this takes a long time. Am I doing something wrong?
A This may seem like a simple question,
but it isn't. The root of the problem is that you've made
an assumption that functions provided in the ROM are fast, so they'll solve your problem. In
this case, you assumed that FindStringInFrame would be fast. You're both right and wrong.
FindStringInFrame is fast, but it still has to linearly search every slot in every frame recursively.
That means that if you have thousands of entries, it's checking thousands of frames. You can
talk about how long something will take by calculating the worst case. FindStringInFrame has
to search all your data frames (thousands of items), and for each frame it has to check each slot
to see if it's a string. If so, it then has to check to see if the string you gave it matches the string
it's looking for (step by step down the string). So if you had n strings (not just data items), and
the average length of a string was m characters, that's n *m checks. In computer science terms,
you would say that FindStringInFrame is an O( n *m ) operation; this is called Big-Oh notation
and, in its simplest form, refers to the worst-case time.
This means you should think about other data structures and methods of accessing them. In
your case, a simple change of data representation would result in a massive speedup. The idea is
to make the expression in the Big-Oh notation have the smallest possible value. One way to do
this is to reduce the search time for your key phrases. Since you have a fixed set of data, you can
sort them and use a binary search algorithm. You can store the actual data in arrays and store
indexes along with the key items.
The nice thing about a binary search is that you're always cutting your search space in half. On
average, you only have to check log to the base 2 of the data. In Big-Oh notation, that's O(log n ).
Of course you still have to do the individual string comparisons, so you end up with O( m log n ).
So for 1000 items, FindStringInFrame takes 1,000,000 time units, but the modified method
takes 3000, a speedup of 300 times! It's unlikely that a function implemented at a low level
performs 300 times faster than custom NewtonScript code.
This excursion into computer science should make you think about your data structures and
how you access them. Of course an academic exercise can take you only so far. You also have to
get your feet wet and test the code. You can use Ticks to get rough estimates of time, and Stats
(after a GC) to get estimates of memory.
Q The following is a viewClickScript from a pickList button in my application. Why does it take so long to
execute?
viewClickScript.func(unit)
begin
currentPickItems := [];
for i := 0 to Length(defaultPickItems) - 1 do
if i = currentSelectedItem then
AddArraySlot(currentPickItems,
{item: defaultPickItems[i], mark: kCheckMarkChar});
else
AddArraySlot(currentPickItems, defaultPickItems[i]);
if :TrackHilite(unit) then
DoPopUp(currentPickItems, :LocalBox().right+3,
:LocalBox().top, self);
end
A There are several possible reasons why your code would execute slowly. Since they potentially
apply to lots of code out there, I'll go through each one separately. At the end is a rewritten
function that should execute considerably faster.
- Lookup costs. Assuming that currentPickItems, currentSelectedItem, and defaultPickItems
are slots somewhere in your view hierarchy, at best they're slots in the pick button, at worst
they're in your base application view. Remember that each access to a variable requires an
inheritance lookup: check locals, then globals, then current context, then the _proto chain,
then the _parent chain. This cost isn't high for single references but can be deadly in loops.
Every cycle through your loop, you're doing three lookups; that's a lot of overhead. The
solution is to use local variables for faster access.
- Unnecessary object creation. The AddArraySlot call will grow, and potentially copy, the
array on the NewtonScript heap, resulting in a lot of unnecessary memory movement. Since
you know the length of the currentPickItems array in advance, you should preallocate the
array and use the array accessor (that is, [n]) to add array elements. You can use the Array
function call to allocate the array:
local pickItems := Array(Length(defaultPickItems), nil);
- Unnecessary execution. You need to create a new pick list only if the call to TrackHilite
succeeds. You should make the TrackHilite conditional be the outer conditional:
if :TrackHilite(unit) then
begin
// construct pick list and DoPopUp
...
end;
- Inefficient variable initialization. It's inefficient to use a loop for initializing
currentPickItems from defaultPickItems, because currentPickItems has only minor
differences. It's better to use Clone for initialization. This way you get a new array whose
elements are references back to the array items in defaultPickItems. All you need to do is
replace the individual references in currentPickItems with their new or modified values. It's
the difference between an O( n ) operation (traversing all the array items in defaultPickItems)
and an O(1) operation (accessing only the changed item). In other words, expect about an
order of magnitude difference.
- Unnecessary slot. In this case you don't need to have a currentPickItems slot since its value
is recreated each time the viewClickScript is executed. You're better off using a local
variable.
The modified code is shown below. To illustrate the savings, I ran a brief test using a
defaultPickItems array of ten elements. Each function is called 100 times (note that TrackHilite
was always true). I found the following code to be over six times faster than the original code.
viewClickScript.func(unit)
begin
if :TrackHilite(unit) then
begin
local pickItems := Clone(defaultPickItems);
local selectedItem := currentSelectedItem;
local l := :LocalBox();
if selectedItem then
pickItems[selectedItem] :=
{item: pickItems[selectedItem], mark: kCheckMarkChar};
DoPopUp(pickItems, l.right+3, l.top, self);
end;
end
Q I've written my own IsASCIIAlpha, IsASCIINumeric, etc. functions. They seem to be really slow. Why
is that? Here's my IsASCIIAlpha:
// returns true if s is an alpha string (i.e., between a..z or A..Z)
IsASCIIAlpha.func(s)
begin
local c := Upcase(Clone(s));
local i;
for i := 0 to StrLen(c) - 1 do
if (StrCompare(SubStr(c, i, 1), "A") < 0) or
(StrCompare(SubStr(c, i, 1), "Z") > 0) then
return nil;
true;
end;
A The main source of the slowness is that you're using string functions when character functions
would be faster. The distinction is subtle but important. In the code above, you loop through
each length 1 substring of the target string to determine whether it's an alpha character. All this
takes time. The Upcase call is O( n ), as are the SubStr and StrCompare. Of course, the
StrCompare isn't really that slow, but it's still slower than you need.
The SubStr call is returning a single character at a time, but in the form of a string. That means
there is a memory allocation for at least two characters (the content and the null terminator) for
each call to SubStr. A better way is to compare each character of the string. In certain
circumstances you can access a character at a time with the array accessor (that is, []). An
example of a function that does this is IsASCIIAlpha3 (see the code on this issue's CD). In
general, when you need either a single character from a string or character-by-character access,
the array-like syntax is faster.
Note that the final fix to the code is that it doesn't do any preprocessing of the string; instead it
uses a lookup in an pregenerated array of valid alphabetic ASCII characters. That gives it a
significant speed advantage. Since timing in the Inspector is a useful technique, the code to do
the timings and print results is included on the CD. Also note that this function is specifically
for ASCII characters, so characters like é and ß would fail. Something else to note:
Newton is a Unicode-based device. ASCII is a subset of Unicode (from
0x0000 to 0x007F), but Unicode characters up to 0xFFFD are documented. Your routine is
checking only some of the characters on page 0 (that is, characters of the form 0x00 nn ), but it
must deal with all characters.
Q I'm trying to use the trace global to get information on what methods are called. But I get lots of output
that doesn't start or end where I want. What can I do?
A There are really two questions here: how to use trace effectively, and how to use the output.
Usually you would turn tracing on inside a method, then turn it off later on in the code.
Unfortunately, you need to do more than just set the value of trace; you also have to force the
interpreter to notice that trace has changed. The PIE Developer Technical Support
NewtonScript Q&A on debugging (on this issue's CD, among other places) tells you how to do
this.
// to turn tracing on for functions
trace := 'functions;
// force interpreter to notice change in state of trace variable
Apply(func () nil, []);
// to turn tracing off
trace := nil;
Apply(func () nil, []);
Once you have the trace output, you should cut and paste it into a text processor. There are
three main bits of information you can get from a trace:
- You can look at how many messages are generated from an apparently simple call. You can
use trace in conjunction with function call timings made using Ticks to see why a particular
call takes so long. Using the find feature of your text processor, you can jump to the
function call you're looking at.
- You can look at the values passed in and returned by function calls.
- Perhaps most useful of all, you can use the text processor to strip away all the extraneous
information (things like the lines specifying return values -- that is, lines that contain the
string "=>" as the first non-whitespace entry) so that you're left with the messages sent.
Then you can sort the messages and get a histogram of the results. This process is easier if
you have a text processor that supports grep-like text substitution (regular expressions) and
sorts.
Q I'm using the Newton Toolkit layout editor to organize my data object classes in my application. I have
20 classes with one layout per object type. To access the objects, I declare each class layout to the main
application. This gives me the benefits of parent inheritance. Unfortunately, even my test applications are
memory hogs. I would expect a time penalty, but why is there such a large space penalty?
A The space penalty is much larger than it needs to be. You're using a layout editor to edit your
classes so that you can graphically edit the classes' slots. But this has the disadvantage that you
have to specify each class as some sort of view class or prototype, perhaps a simple clView. It's
the cause of your space problem, because you also carry all the memory and runtime allocation
that goes with a view. Since your layouts are declared to your base application view, and since
the default for a clView is visible, each of your classes is also a full runtime view. That can take
a large amount of space on the NewtonScript heap. For a clView, the penalty is roughly 40
bytes, so that's an extra 800 bytes of NewtonScript heap that you can free.
A better solution is to avoid using the NewtonScript heap for your class (after all, that's one of
the advantages of prototype inheritance). You can do this in one of two ways:
- If you still want to use a layout editor to edit your class, you can use a user prototype instead
of a layout. At run time, you'll have access to the data class using the PT_<filename>
syntax documented in the Newton Toolkit User's Guide (page 4-25). Remember that the user
prototype will be read-only.
- The other option is to textually define the class. You can do this in your Project Data file,
or use the Load command to read in a different text file. See the PIE Developer Technical
Support NewtonScript Q&A document on this issue's CD for more information.
The llama is the unofficial mascot of the Developer Technical Support group in Apple's Personal Interactive Electronics (PIE)
division.*
Send your Newton-related questions to NewtonMail DRLLAMA or AppleLink DR.LLAMA. The first time we use a question
from you, we'll send you a T-shirt.*
Thanks to our PIE Partners for the questions used in this column, and to jXopher, Bob Ebert, Mike Engber, Kent Sandvik,
Jim Schram, and Maurice Sharp for the answers. *
Have more questions? Need more answers? Take a look at PIE Developer Info on AppleLink. *