TweetFollow Us on Twitter

Inline Code
Volume Number:1
Issue Number:9
Column Tag:Forth Forum

"Inline Code for MacForth"

By Jörg Langowski, Chemical Engineer, Fed. Rep. of Germany, MacTutor Editorial Board

Speeding up Forth with Inline Code

When you use your computer for applications that require a lot of data shuffling and calculations, work with large arrays and matrices and so on, you tend to become a little paranoid about speed. Although Forth code is very compact through its threaded structure, and word execution (i.e. subroutine calling) is reasonably well optimized in MacForth (see MacTutor V1 No2), I have always felt uncomfortable with the overhead that goes into the execution of a simple word like DROP, whose 'active part' consists of one 16-bit word of machine code.

Just as a reminder: when the Forth em executes the token for DROP in a definition, it calls a subroutine that looks like this:

DROP  ADDQ.L#4,A7
 JMP  (A4)

So it is a simple 4-byte increment of the stack pointer that does the DROP job. But, then the next token has to be fetched and executed by jumping to the NEXT routine, whose address is contained in A4, the base pointer. This makes for a several hundred precent overhead, as compared to the increment itself. This overhead is not so dramatic with other words, but it is still there: and all in all the Sieve benchmark needs 21 seconds to run in MacForth, compare this to 9 seconds in compiled C (Consulair).

How can we speed up the code? After all, we have complete control over what goes into the dictionary and could put the machine code that we need right in there, no need for time-expensive subroutine calling. This is what the Forth 2.0 assembler enables you to do. However, if you create a piece of code in Forth assembler, it tends to look much more cryptic than 'normal' assembler, which after all is readable with adequate documentation.

It would be much nicer if we had a means to create the assembly code that corresponds to a DROP by writing a similar word, such as %DROP: something like a macro. No need to worry about which registers to use, and you could use 'almost normal' Forth code for writing your routine.

It shouldn't be that difficult to persuade the Forth system to execute machine code that is embedded in a definition. Every Forth word starts with at least one executable piece of machine code, trap calls for Forth-defined words such as colon definitions and 'real' 68000 code for machine code definitions. However, this gives you either machine code or Forth, not both. Our goal is to define words that allow switching between 68000 and Forth code within one definition. Similar words do exist in the Forth 2.0 assembler, but it lacks a set of macros that allow you to write inline Forth code instead of assembly code. Furthermore, you cannot define control structures that easily.

Assume we have Forth code that looks like this:

 ...
 <token 1>
 <token 2>
 <token X>
 <machine instruction 1>
 <machine instruction 2>
 ...

etc. This sequence of instruction will get executed just fine if <token X> is a word that transfers execution to the word just following. We'll call this word >CODE and define it as follows:

: >CODE 
    here 2+ make.token w, [compile] [ ;
    immediate

This word, which is executed during compilation, takes the next free address in the dictionary, adds 2 (this is where execution of the machine code is to start) and compiles this address as a token into the dictionary. Since a token just tells the Forth interpreter 'jump to the address that I refer to', machine code execution will start at the address following >CODE.

This is what happens at execution time. At compilation time, the words following >CODE in the input stream are executed, not compiled (this is what the [COMPILE] [ does). Therefore, if the words following >CODE are macros that stuff assembly code into the dictionary, you have your inline code right there.

We'll get to those macros in a minute. First, what remains is the problem how to get out of the machine code. You might recall that all machine-level Forth definitions finish with a

 JMP  (A4)

and the NEXT routine, pointed to by A4, gets the next token from the Forth code. The pointer to the next token is in register A3. Unfortunately, after we executed >CODE, A3 remained unchanged and still points to the word following the >CODE token. Which is 68000 code and certainly nothing that the interpreter will swallow. Therefore we have to reset A3 before we jump back into the Forth interpreter. This is what the word >FORTH does:

: >FORTH 47fa0004 , 4ed4 w, [compile] ] ;

 LEA  4(PC),A3
 JMP  (A4)

Remember, when >FORTH appears in the input stream, we are still in execution mode, from the preceding >CODE (unless we mixed things up). So >FORTH gets executed when used in a definition; it assembles code that loads A3 with the address following the JMP, then executes the JMP. Then the mode is switched back to regular Forth compilation again.

Between >CODE and >FORTH we can now place our macros that generate inline machine code corresponding to Forth primitives. The code for any of the primitives is found very easily by disassembling from the original Forth system. Of course, you may define your own code, use different registers than the MacForth definitions do or optimize the code. For instance, the built-in multiplication routine is a prime candidate for removing overhead. The routine *, which calls the multiplication primitive, M*, always does a 32- by 32-bit multiply and then drops the upper 32 bits of the double precision product. Some sloppiness on the part of Creative Solutions, I presume. Of course, a direct 16- by 16-bit multiply would be much faster.

I have written the macros in hex code, so that they'll work without the assembler, in case you are using Forth 1.1. The machine code is given as a comment in the program text.

Literals

The %LIT and %WLIT macros serve as a means to put constants and addresses on the stack. They compile a long move, resp. word move instruction with the number on the stack at compilation time compiled as the data word(s) following the instruction. So the way to put the address of a variable on the stack in inline code is just to write: <variable> %LIT.

Control Structures

The goal was to speed up the Sieve benchmark (as an example). Of course, the code would be far from optimal if we still had to use the Forth control structures; they should be coded inline, too. This means we have to keep track of addresses that we want to branch to.

The program below provides two examples, %IF...%THEN...%ELSE and %DO...%LOOP. The other control structures are not included, since they weren't necessary for this particular example. But after reading through, you should be able to write your own code for that.

%IF compiles a branch which is taken when the number on top of stack is zero. This branch has a zero displacement when first compiled. At the same time, the dictionary address is pushed on the compilation time stack (HERE). When %THEN is encountered, the branch displacement is calculated and put into the correct address. Same holds for %ELSE, only that another unconditional branch is compiled that is taken at the end of the %IF part. This branch is resolved at the %THEN.

The code compiled by %DO takes the initial and final values from the stack and puts them on the return stack. During compilation, HERE is put on the stack as a reference for the backward branch taken by %LOOP. %LOOP compiles code that increments the loop counter by one and tests it against the limit; if it is still below the limit, the backward branch is taken (calculated at compilation time). %+LOOP behaves just like %LOOP, only that the increment is the number on top of stack. Note that there is one difference between %+LOOP and the usual Forth +LOOP: while the latter works with positive and negative loop increments, ours works only with positive. I did this in the interest of speed.

The Sieve Benchmark

With all these macros available we can now recode the Sieve of Erastothenes prime number benchmark into inline machine code. The changes that have to be made to the Forth code are only minor ones. At the point where the inline code is supposed to start, we insert >CODE; all Forth words thereafter are inline macros. They are distinguished from the regular Forth words by the preceding percent sign. When the inline part ends, we write >FORTH to jump back into interpreter mode.

The resulting code works (!!) and executes in 9.7 seconds, as compared to 21 seconds for the Forth code.

Inline compiler definitions ( 060585 jl )

(c) June 1985 MacTutor by J. Langowski

This code is meant as an example for speeding up time-critical Forth code through the insertion of inline machine code. The words defined here are by no means a complete Forth compiler. No attempt was made to use the same words as standard Forth and do context switching; I felt that this would have been a) more complicated and b) actually confusing, because you tend to lose track of when you are in inline mode and when in interpreted Forth mode. Therefore, all inline words are compiled into the standard Forth vocabulary and have the names of the corresponding Forth words preceded by a '%'. The only control structures are %IF...%ELSE...%THEN and %DO...%LOOP/%+LOOP, where the %+LOOP works only for positive increments. You are encouraged to build other control structures, using the same principles.

( inline assembly macros)  ( 060285 jl )
hex
: >code here 2+ make.token w, [compile] [ ;  
immediate

: >forth 47fa0004 , 4ed4 w, [compile] ] ;
{LEA  4(PC),A3 }
{JMP  (A4)} 
: %swap 202f0004 , 2f570004 , 2e80 w, ;
{MOVE.L 4(A7),D0 }
{MOVE.L (A7),4(A7) }
{MOVE.L D0,(A7)  }
 
: %drop 588f w, ; { ADDQ.L  #4,A7  }
: %dup 2f17 w, ;  { MOVE.L  (A7),-(A7) }
: %over 2f2f0004 , ; { MOVE.L 4(A7),-(A7) }

: %+! 205f201f , d190 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A0)  }

: %rot 202f0008 , 2f6f0004 , 0008 w,
       2f570004 , 2e80 w, ;
{MOVE.L 8(A7),D0 }
{MOVE.L 4(A7),8(A7)}
{MOVE.L (A7),4(A7) }
{MOVE.L D0,(A7)  }

: %+ 201fd197 , ;  
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A7)  }
  
: %- 201f9197 , ;
{MOVE.L (A7),D0  }
{SUB.L  D0,(A7)  }

: %i 2f16 w, ;     { MOVE.L   (A6),-(A7) }
: %j 2f2e0008 , ;  { MOVE.L  8(A6),-(A7) }
: %k 2f2e0010 , ;  { MOVE.L 16(A6),-(A7) }
{ %k is a word that does not exist in 
  MacForth, but is very useful to extract 
  a loop index one level further down    }

: %i+ 2017d096 , 2e80 w, ;
{MOVE.L (A7),D0  }
{ADD.L  (A6),D0  }
{MOVE.L D0,(A7)  }

: %c@ 42802057 , 10102e80 , ;
{CLR.L  D0}
{MOVE.L (A7),A0  }
{MOVE.B (A0),D0  }
{MOVE.L D0,(A7)  }
  
: %w@ 20574257 , 3f500002 , ;
{MOVE.L (A7),A0  }
{CLR.W  (A7)}
{MOVE (A0),2(A7) }

: %@ 20572e90 , ;
{MOVE.L (A7),A0  }
{MOVE.L (A0),(A7)}
  
: %c! 205f201f , 1080 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{MOVE.B D0,(A0)  }

: %w! 205f201f , 3080 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{MOVE D0,(A0)  }
   
: %! 205f209f , ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,(A0) }

: %>r 2d1f w, ;  { MOVE.L  (A7)+,-(A6)  }  
: %r> 2f1e w, ;  { MOVE.L  (A6)+,-(A7)  }

: %ic!  201f2056 , 1080 w, ;
{MOVE.L (A7)+,D0 }
{MOVE.L (A6),A0  }
{MOVE.B D0,(A0)  }

: %lit 2f3c w, , ;
{MOVE.L #xxxx,-(A7)}
{ where xxxx is compiled from the stack 
  into the next four bytes }
  
: %wlit 3f3c w, w, ;
{MOVE #xxxx,-(A7)}
{ and compile top of stack into next word }

: %< 4280bf8f , 6c025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BGE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %> 4280bf8f , 6f025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BLE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %= 4280bf8f , 66025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BNE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0= 42804a97 , 66025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BNE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0< 42804a97 , 6a025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BPL  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0> 42804a97 , 6f025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BLE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %and 201fc197 , ;
{MOVE.L (A7)+,D0 }
{AND.L  D0,(A7)  }
  
: %or 201f8197 , ;
{MOVE.L (A7)+,D0 }
{OR.L D0,(A7)  }

: %if 4a9f6700 , here 0 w, ;
{TST.L  (A7)+  }
{BEQ  xxxx}
{ xxxx is a 16 bit displacement that is 
  resolved by %THEN   }

: %then here over - swap w! ;
: %else 6000 w, here 0 w, swap %then ;
{BRA  xxxx}
{ resolves preceding %IF and leaves new
  empty unconditional branch to be filled
  by %THEN     }

: %do 2d2f0004 , 2d1f588f , here ;
{MOVE.L 4(A7),-(A6)}
{MOVE.L (A7)+,-(A6)}
{ADDQ.L #4,A7  }
{ leaves HERE on the stack for back branch
  by %LOOP or %+LOOP      }

: %loop 5296204e , b1886e00 , 
                 here - w, ddfc w, 8 , ;
{ADDQ.L #1,(A6)  }
{MOVE.L A6,A0  }
{CMPM.L (A0)+,(A0)+}
{BGT  xxxx}
{ADDA.L #8,A6  }
{ the last instruction cleans up the return
  stack. Branch resolved in this word     }

: %+loop 201fd196 , 204e w, b1886e00 , 
                 here - w, ddfc w, 8 , ;
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A6)  }
{MOVE.L A6,A0  }
{CMPM.L (A0)+,(A0)+}
{BGT  xxxx}
{ADDA.L #8,A6  }

decimal

( Eratosthenes Sieve Benchmark,
             inline code) ( 060285 jl )
 8192 constant size  
 create flags  size allot
: primes   flags  size 01 fill 
  >code 0 %lit size %lit 0 %lit
    %do  flags %lit %i+ %c@
       %if 3 %lit %i+ %i+ %dup %i+ 
             size %lit %<
         %if size %lit flags %lit %+ 
           %over %i+ flags %lit %+
           %do 0 %lit %ic! %dup %+loop
         %then %drop 1 %lit %+
       %then
    %loop >forth . ." primes  "  ;
 : 10times    
   1 sysbeep 10 0 do  primes cr loop
   1 sysbeep ;

( Eratosthenes Sieve Benchmark,
                standard version)
 8192 constant size       
 create flags  size allot
: primes flags size 01 fill 
  0  size 0  
    do  flags  i+ c@
      if  3 i+ i+ dup i+  size <  
         if  size flags +  over i+  flags +
             do  0 ic!  dup  +loop
         then  drop 1+  
       then
    loop  . ." primes  "  ;
 : 10times    
   1 sysbeep 10 0 do  primes cr loop  
   1 sysbeep ;
 

Community Search:
MacTech Search:

Software Updates via MacUpdate

Trend Micro 11.0.2062 - An essential sec...
Trend Micro Antivirus provides essential security for macOS with real-time malware detection and mitigation in an affordable solution with a simple, intuitive interface. However, be aware that more... Read more
Backblaze 7.0.2.490 - Online backup serv...
Backblaze is an online backup service designed from the ground-up for the Mac. With unlimited storage available for $6 per month, as well as a free 15-day trial, peace of mind is within reach with... Read more
Pro Video Formats 2.2.1 - Updates for pr...
Pro Video Formats includes support for the following professional video codecs: Apple Intermediate Codec Apple ProRes AVC-Intra 50 / 100 / 200 / 4:4:4 / LT AVC-LongG XAVC XF-AVC DVCPRO HD HDV XDCAM... Read more
Boom 3D 1.3.11 - $19.99
Boom 3D is a revolutionary app with 3D Surround Sound and phenomenally rich and intense audio that is realistic and works on any headphones. Features 3D surround sound Built-in audio player... Read more
Final Cut Pro 10.5.2 - Professional vide...
Redesigned from the ground up, Final Cut Pro combines revolutionary video editing with a powerful media organization and incredible performance to let you create at the speed of thought.... Read more
Chromium 89.0.4389.72 - Fast and stable...
Chromium is an open-source browser project that aims to build a safer, faster, and more stable way for all Internet users to experience the web. Version 89.0.4389.72: Complete list of changes can... Read more
iMovie 10.2.3 - Edit personal videos and...
With a streamlined design and intuitive editing features, iMovie lets you create Hollywood-style trailers and beautiful movies like never before. Browse your video library, share favorite moments,... Read more
OmniPlan 4.2.5 - Professional-grade proj...
With OmniPlan, you can create logical, manageable project plans with Gantt charts, schedules, summaries, milestones, and critical paths. Break down the tasks needed to make your project a success,... Read more
Compressor 4.5.2 - Adds power and flexib...
Compressor adds power and flexibility to Final Cut Pro X export. Customize output settings, work faster with distributed encoding, and tap into a comprehensive set of delivery features. Features:... Read more
Motion 5.5.1 - Create and customize Fina...
Motion is designed for video editors, Motion 5 lets you customize Final Cut Pro titles, transitions, and effects. Or create your own dazzling animations in 2D or 3D space, with real-time feedback as... Read more

Latest Forum Discussions

See All

Frogger in Toy Town's latest update...
Konami Digital Entertainment has announced today that their Apple Arcade title Frogger in Toy Town has been updated to introduce a new ranked Endurance Mode. This new game variant's arrival is also accompanied by a few other changes. [Read more] | Read more »
Mitoza is surreal adventure toy you can...
The folks behind the Rusty Lake games have just put a new title onto the App Store. Second Maze, Rusty Lake's collaborative publishing brand, has just brought this 10 year old adventure game from Gal Mamalya to mobile. The best part about all of... | Read more »
Pocket Gamer Awards 2021: You have five...
Three weeks ago our sister site, Pocket Gamer, entered the voting stage for the upcoming Pocket Gamer Awards 2021 and is now in the final stretch. You only have a few hours left to vote for the games you’ve enjoyed on mobile in the past year, as... | Read more »
Patty Stack is a casual arcade game, ava...
Patty Stack is a casual arcade title that's available now for iOS and Android. It's the debut game from developer Feeka Games tasks players with making an increasingly giant burger tower. Think of it as Tower Bloxx but more edible. [Read more] | Read more »
Distract Yourself With These Great Mobil...
There’s a lot going on right now, and I don’t really feel like trying to write some kind of pithy intro for it. All I’ll say is lots of people have been coming together and helping each other in small ways, and I’m choosing to focus on that as I... | Read more »
Genshin Impact Guide - Gacha Strategy: W...
This is part 2 of our Genshin Impact gacha strategy guides. See part 1 here. You can check out more guides for Genshin Impact here. | Read more »
Slashy Camp is a new endless runner insp...
Blue Wizard Digital has released Slashy Camp onto iOS and Android after it spent a short amount of time in early access. [Read more] | Read more »
Kinder World is a relaxing game about lo...
Lumi Interactive is releasing a game called Kinder World later this year on iOS and Android, which is all about looking after houseplants. [Read more] | Read more »
Steam Link Spotlight - Fights in Tight S...
Steam Link Spotlight is a feature where we look at PC games that play exceptionally well using the Steam Link app. Our last entry was on Hades. Read about how it plays using Steam Link over here. | Read more »
Lyxo, the light-based puzzler for mobile...
Vienna-based independent game studio Emoak has just released its unique light-based puzzler for iOS and Android. Founded in 2014 by Tobias Sturn, the company is also the creative force behind the infinite climbing game Paper Climb, as well as the... | Read more »

Price Scanner via MacPrices.net

Weekend Sale: $100 off Apple iPad Magic Keybo...
Amazon has Apple iPad Magic Keyboards on sale for $100 off MSRP for a limited time. Amazon’s prices are the lowest available for iPad Magic Keyboard from any Apple reseller this weekend: – Magic... Read more
Gazelle now offering a full line of refurbish...
Gazelle is now offering a full range of discounted, refurbished, unlocked Apple iPhone 12 models starting at $649. iPhones are offered in Fair, Good, and Excellent conditions, and multiple colors are... Read more
These are the latest discounted iPhones Apple...
Apple has a range of Certified Refurbished iPhones available right now starting at only $339. Apple includes a standard one-year warranty, new outer shell, and shipping is free. According to Apple, “... Read more
Save up to $64 on new M1 MacBook Airs at Expe...
Apple reseller Expercom has 2020 13″ M1 MacBook Airs on sale for $51-$64 off Apple’s MSRP with prices starting at $947.96. In addition to their MacBook Air sale prices, take $50 off AppleCare+ when... Read more
Discounts available on 16″ MacBook Pros with...
Upgrade a 16″ 6-Core or 8-Core MacBook Pro from 16GB of standard RAM to 32GB at Adorama, and save $100-$210 over Apple’s price for this custom option: – 16″ 6-Core MacBook Pro/32GB RAM: $2699, save $... Read more
10.9″ iPad Airs on sale for $50-$70 off Apple...
Amazon has new 2020 10.9″ Apple WiFi iPad Airs in stock and on sale today for up to $70 off MSRP with prices starting at $549. Note that Amazon’s sale price might be restricted to certain colors (see... Read more
Apple restocks 2020 27″ 5K iMacs for up to $3...
After an initial offering in January, Apple has restocked a full line of Certified Refurbished 2020 27″ 5K iMacs starting at $1529 and up to $350 off original MSRP. Apple’s one-year warranty is... Read more
Sale! 16″ 8-Core MacBook Pro for $2449, $350...
Apple reseller Adorama has the 16″ 2.3GHz 8-Core Space Gray MacBook Pro in stock and on sale today for $2449 including free shipping. Their price is $350 off Apple’s MSRP for this model, and it’s the... Read more
Roundup of 13″ Multi-Core Intel MacBook Pro s...
Apple resellers are offering significant sales & deals this week on 2020 13″ MacBook Pros with 10th generation Intel CPUs. Take up to $250 off Apple’s MSRP, get free fast shipping, and/or pay no... Read more
64GB iPhone 8 Plus available for $379 at Appl...
Apple has the 64GB iPhone 8 Plus in Space Gray & Gold colors available for $379 today, Certified Refurbished. Each phone is unlocked and comes with Apple’s standard 1-year warranty and free... Read more

Jobs Board

Geek Squad Advanced Repair *Apple* Professi...
**795178BR** **Job Title:** Geek Squad Advanced Repair Apple Professional **Job Category:** Store Associates **Store Number or Department:** 001406-Allen Park-Store Read more
Geek Squad *Apple* Consultation Professiona...
**796549BR** **Job Title:** Geek Squad Apple Consultation Professional **Job Category:** Store Associates **Store Number or Department:** 001800-Hot Springs-Store Read more
*Apple* Mobility Specialist - Best Buy (Unit...
**796014BR** **Job Title:** Apple Mobility Specialist **Job Category:** Store Associates **Store Number or Department:** 001776-Woodmore Towne Centre-Store **Job Read more
Systems Architect, *Apple* Production Engin...
…package beginning on your first day? If so, we hope you'll keep reading! The Apple Sales Engineering and account team is looking for a stellar presales engineer with Read more
Systems Engineer, Webscale, *Apple* Retail,...
…beginning on your first day? If so, we hope you'll keep reading! The Apple Sales Engineering team is looking for a pre-sales engineer with Enterprise engineering Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.