TweetFollow Us on Twitter

Inline Code
Volume Number:1
Issue Number:9
Column Tag:Forth Forum

"Inline Code for MacForth"

By Jörg Langowski, Chemical Engineer, Fed. Rep. of Germany, MacTutor Editorial Board

Speeding up Forth with Inline Code

When you use your computer for applications that require a lot of data shuffling and calculations, work with large arrays and matrices and so on, you tend to become a little paranoid about speed. Although Forth code is very compact through its threaded structure, and word execution (i.e. subroutine calling) is reasonably well optimized in MacForth (see MacTutor V1 No2), I have always felt uncomfortable with the overhead that goes into the execution of a simple word like DROP, whose 'active part' consists of one 16-bit word of machine code.

Just as a reminder: when the Forth em executes the token for DROP in a definition, it calls a subroutine that looks like this:

DROP  ADDQ.L#4,A7
 JMP  (A4)

So it is a simple 4-byte increment of the stack pointer that does the DROP job. But, then the next token has to be fetched and executed by jumping to the NEXT routine, whose address is contained in A4, the base pointer. This makes for a several hundred precent overhead, as compared to the increment itself. This overhead is not so dramatic with other words, but it is still there: and all in all the Sieve benchmark needs 21 seconds to run in MacForth, compare this to 9 seconds in compiled C (Consulair).

How can we speed up the code? After all, we have complete control over what goes into the dictionary and could put the machine code that we need right in there, no need for time-expensive subroutine calling. This is what the Forth 2.0 assembler enables you to do. However, if you create a piece of code in Forth assembler, it tends to look much more cryptic than 'normal' assembler, which after all is readable with adequate documentation.

It would be much nicer if we had a means to create the assembly code that corresponds to a DROP by writing a similar word, such as %DROP: something like a macro. No need to worry about which registers to use, and you could use 'almost normal' Forth code for writing your routine.

It shouldn't be that difficult to persuade the Forth system to execute machine code that is embedded in a definition. Every Forth word starts with at least one executable piece of machine code, trap calls for Forth-defined words such as colon definitions and 'real' 68000 code for machine code definitions. However, this gives you either machine code or Forth, not both. Our goal is to define words that allow switching between 68000 and Forth code within one definition. Similar words do exist in the Forth 2.0 assembler, but it lacks a set of macros that allow you to write inline Forth code instead of assembly code. Furthermore, you cannot define control structures that easily.

Assume we have Forth code that looks like this:

 ...
 <token 1>
 <token 2>
 <token X>
 <machine instruction 1>
 <machine instruction 2>
 ...

etc. This sequence of instruction will get executed just fine if <token X> is a word that transfers execution to the word just following. We'll call this word >CODE and define it as follows:

: >CODE 
    here 2+ make.token w, [compile] [ ;
    immediate

This word, which is executed during compilation, takes the next free address in the dictionary, adds 2 (this is where execution of the machine code is to start) and compiles this address as a token into the dictionary. Since a token just tells the Forth interpreter 'jump to the address that I refer to', machine code execution will start at the address following >CODE.

This is what happens at execution time. At compilation time, the words following >CODE in the input stream are executed, not compiled (this is what the [COMPILE] [ does). Therefore, if the words following >CODE are macros that stuff assembly code into the dictionary, you have your inline code right there.

We'll get to those macros in a minute. First, what remains is the problem how to get out of the machine code. You might recall that all machine-level Forth definitions finish with a

 JMP  (A4)

and the NEXT routine, pointed to by A4, gets the next token from the Forth code. The pointer to the next token is in register A3. Unfortunately, after we executed >CODE, A3 remained unchanged and still points to the word following the >CODE token. Which is 68000 code and certainly nothing that the interpreter will swallow. Therefore we have to reset A3 before we jump back into the Forth interpreter. This is what the word >FORTH does:

: >FORTH 47fa0004 , 4ed4 w, [compile] ] ;

 LEA  4(PC),A3
 JMP  (A4)

Remember, when >FORTH appears in the input stream, we are still in execution mode, from the preceding >CODE (unless we mixed things up). So >FORTH gets executed when used in a definition; it assembles code that loads A3 with the address following the JMP, then executes the JMP. Then the mode is switched back to regular Forth compilation again.

Between >CODE and >FORTH we can now place our macros that generate inline machine code corresponding to Forth primitives. The code for any of the primitives is found very easily by disassembling from the original Forth system. Of course, you may define your own code, use different registers than the MacForth definitions do or optimize the code. For instance, the built-in multiplication routine is a prime candidate for removing overhead. The routine *, which calls the multiplication primitive, M*, always does a 32- by 32-bit multiply and then drops the upper 32 bits of the double precision product. Some sloppiness on the part of Creative Solutions, I presume. Of course, a direct 16- by 16-bit multiply would be much faster.

I have written the macros in hex code, so that they'll work without the assembler, in case you are using Forth 1.1. The machine code is given as a comment in the program text.

Literals

The %LIT and %WLIT macros serve as a means to put constants and addresses on the stack. They compile a long move, resp. word move instruction with the number on the stack at compilation time compiled as the data word(s) following the instruction. So the way to put the address of a variable on the stack in inline code is just to write: <variable> %LIT.

Control Structures

The goal was to speed up the Sieve benchmark (as an example). Of course, the code would be far from optimal if we still had to use the Forth control structures; they should be coded inline, too. This means we have to keep track of addresses that we want to branch to.

The program below provides two examples, %IF...%THEN...%ELSE and %DO...%LOOP. The other control structures are not included, since they weren't necessary for this particular example. But after reading through, you should be able to write your own code for that.

%IF compiles a branch which is taken when the number on top of stack is zero. This branch has a zero displacement when first compiled. At the same time, the dictionary address is pushed on the compilation time stack (HERE). When %THEN is encountered, the branch displacement is calculated and put into the correct address. Same holds for %ELSE, only that another unconditional branch is compiled that is taken at the end of the %IF part. This branch is resolved at the %THEN.

The code compiled by %DO takes the initial and final values from the stack and puts them on the return stack. During compilation, HERE is put on the stack as a reference for the backward branch taken by %LOOP. %LOOP compiles code that increments the loop counter by one and tests it against the limit; if it is still below the limit, the backward branch is taken (calculated at compilation time). %+LOOP behaves just like %LOOP, only that the increment is the number on top of stack. Note that there is one difference between %+LOOP and the usual Forth +LOOP: while the latter works with positive and negative loop increments, ours works only with positive. I did this in the interest of speed.

The Sieve Benchmark

With all these macros available we can now recode the Sieve of Erastothenes prime number benchmark into inline machine code. The changes that have to be made to the Forth code are only minor ones. At the point where the inline code is supposed to start, we insert >CODE; all Forth words thereafter are inline macros. They are distinguished from the regular Forth words by the preceding percent sign. When the inline part ends, we write >FORTH to jump back into interpreter mode.

The resulting code works (!!) and executes in 9.7 seconds, as compared to 21 seconds for the Forth code.

Inline compiler definitions ( 060585 jl )

(c) June 1985 MacTutor by J. Langowski

This code is meant as an example for speeding up time-critical Forth code through the insertion of inline machine code. The words defined here are by no means a complete Forth compiler. No attempt was made to use the same words as standard Forth and do context switching; I felt that this would have been a) more complicated and b) actually confusing, because you tend to lose track of when you are in inline mode and when in interpreted Forth mode. Therefore, all inline words are compiled into the standard Forth vocabulary and have the names of the corresponding Forth words preceded by a '%'. The only control structures are %IF...%ELSE...%THEN and %DO...%LOOP/%+LOOP, where the %+LOOP works only for positive increments. You are encouraged to build other control structures, using the same principles.

( inline assembly macros)  ( 060285 jl )
hex
: >code here 2+ make.token w, [compile] [ ;  
immediate

: >forth 47fa0004 , 4ed4 w, [compile] ] ;
{LEA  4(PC),A3 }
{JMP  (A4)} 
: %swap 202f0004 , 2f570004 , 2e80 w, ;
{MOVE.L 4(A7),D0 }
{MOVE.L (A7),4(A7) }
{MOVE.L D0,(A7)  }
 
: %drop 588f w, ; { ADDQ.L  #4,A7  }
: %dup 2f17 w, ;  { MOVE.L  (A7),-(A7) }
: %over 2f2f0004 , ; { MOVE.L 4(A7),-(A7) }

: %+! 205f201f , d190 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A0)  }

: %rot 202f0008 , 2f6f0004 , 0008 w,
       2f570004 , 2e80 w, ;
{MOVE.L 8(A7),D0 }
{MOVE.L 4(A7),8(A7)}
{MOVE.L (A7),4(A7) }
{MOVE.L D0,(A7)  }

: %+ 201fd197 , ;  
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A7)  }
  
: %- 201f9197 , ;
{MOVE.L (A7),D0  }
{SUB.L  D0,(A7)  }

: %i 2f16 w, ;     { MOVE.L   (A6),-(A7) }
: %j 2f2e0008 , ;  { MOVE.L  8(A6),-(A7) }
: %k 2f2e0010 , ;  { MOVE.L 16(A6),-(A7) }
{ %k is a word that does not exist in 
  MacForth, but is very useful to extract 
  a loop index one level further down    }

: %i+ 2017d096 , 2e80 w, ;
{MOVE.L (A7),D0  }
{ADD.L  (A6),D0  }
{MOVE.L D0,(A7)  }

: %c@ 42802057 , 10102e80 , ;
{CLR.L  D0}
{MOVE.L (A7),A0  }
{MOVE.B (A0),D0  }
{MOVE.L D0,(A7)  }
  
: %w@ 20574257 , 3f500002 , ;
{MOVE.L (A7),A0  }
{CLR.W  (A7)}
{MOVE (A0),2(A7) }

: %@ 20572e90 , ;
{MOVE.L (A7),A0  }
{MOVE.L (A0),(A7)}
  
: %c! 205f201f , 1080 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{MOVE.B D0,(A0)  }

: %w! 205f201f , 3080 w, ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,D0 }
{MOVE D0,(A0)  }
   
: %! 205f209f , ;
{MOVE.L (A7)+,A0 }
{MOVE.L (A7)+,(A0) }

: %>r 2d1f w, ;  { MOVE.L  (A7)+,-(A6)  }  
: %r> 2f1e w, ;  { MOVE.L  (A6)+,-(A7)  }

: %ic!  201f2056 , 1080 w, ;
{MOVE.L (A7)+,D0 }
{MOVE.L (A6),A0  }
{MOVE.B D0,(A0)  }

: %lit 2f3c w, , ;
{MOVE.L #xxxx,-(A7)}
{ where xxxx is compiled from the stack 
  into the next four bytes }
  
: %wlit 3f3c w, w, ;
{MOVE #xxxx,-(A7)}
{ and compile top of stack into next word }

: %< 4280bf8f , 6c025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BGE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %> 4280bf8f , 6f025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BLE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %= 4280bf8f , 66025380 , 2f00 w, ;
{CLR.L  DO}
{CMPM.L (A7)+,(A7)+}
{BNE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0= 42804a97 , 66025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BNE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0< 42804a97 , 6a025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BPL  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %0> 42804a97 , 6f025380 , 2e80 w, ;
{CLR.L  D0}
{TST.L  (A7)}
{BLE  M1}
{SUBQ.L #1,D0  }
{ M1  MOVE.LD0,-(A7) }

: %and 201fc197 , ;
{MOVE.L (A7)+,D0 }
{AND.L  D0,(A7)  }
  
: %or 201f8197 , ;
{MOVE.L (A7)+,D0 }
{OR.L D0,(A7)  }

: %if 4a9f6700 , here 0 w, ;
{TST.L  (A7)+  }
{BEQ  xxxx}
{ xxxx is a 16 bit displacement that is 
  resolved by %THEN   }

: %then here over - swap w! ;
: %else 6000 w, here 0 w, swap %then ;
{BRA  xxxx}
{ resolves preceding %IF and leaves new
  empty unconditional branch to be filled
  by %THEN     }

: %do 2d2f0004 , 2d1f588f , here ;
{MOVE.L 4(A7),-(A6)}
{MOVE.L (A7)+,-(A6)}
{ADDQ.L #4,A7  }
{ leaves HERE on the stack for back branch
  by %LOOP or %+LOOP      }

: %loop 5296204e , b1886e00 , 
                 here - w, ddfc w, 8 , ;
{ADDQ.L #1,(A6)  }
{MOVE.L A6,A0  }
{CMPM.L (A0)+,(A0)+}
{BGT  xxxx}
{ADDA.L #8,A6  }
{ the last instruction cleans up the return
  stack. Branch resolved in this word     }

: %+loop 201fd196 , 204e w, b1886e00 , 
                 here - w, ddfc w, 8 , ;
{MOVE.L (A7)+,D0 }
{ADD.L  D0,(A6)  }
{MOVE.L A6,A0  }
{CMPM.L (A0)+,(A0)+}
{BGT  xxxx}
{ADDA.L #8,A6  }

decimal

( Eratosthenes Sieve Benchmark,
             inline code) ( 060285 jl )
 8192 constant size  
 create flags  size allot
: primes   flags  size 01 fill 
  >code 0 %lit size %lit 0 %lit
    %do  flags %lit %i+ %c@
       %if 3 %lit %i+ %i+ %dup %i+ 
             size %lit %<
         %if size %lit flags %lit %+ 
           %over %i+ flags %lit %+
           %do 0 %lit %ic! %dup %+loop
         %then %drop 1 %lit %+
       %then
    %loop >forth . ." primes  "  ;
 : 10times    
   1 sysbeep 10 0 do  primes cr loop
   1 sysbeep ;

( Eratosthenes Sieve Benchmark,
                standard version)
 8192 constant size       
 create flags  size allot
: primes flags size 01 fill 
  0  size 0  
    do  flags  i+ c@
      if  3 i+ i+ dup i+  size <  
         if  size flags +  over i+  flags +
             do  0 ic!  dup  +loop
         then  drop 1+  
       then
    loop  . ." primes  "  ;
 : 10times    
   1 sysbeep 10 0 do  primes cr loop  
   1 sysbeep ;
 

Community Search:
MacTech Search:

Software Updates via MacUpdate

jAlbum 19.2 - Create custom photo galler...
With jAlbum, you can create gorgeous custom photo galleries for the Web without writing a line of code! Beginner-friendly, with pro results - Simply drag and drop photos into groups, choose a design... Read more
BlueStacks 4.140.13 - Run Android applic...
BlueStacks App Player lets you run your Android apps fast and fullscreen on your Mac. Feature comparison chart Version 4.140.13: Highlights/Bug Fixes: Feel free to use BlueStacks as your go to... Read more
Adobe Premiere Pro CC 2020 14.0.1 - Digi...
Premiere Pro CC 2020 is available as part of Adobe Creative Cloud for as little as $52.99/month. The price on display is a price for annual by-monthly plan for Adobe Premiere Pro only Adobe Premiere... Read more
VirtualBox 6.1.2 - x86 virtualization so...
VirtualBox is a family of powerful x86 virtualization products for enterprise as well as home use. Not only is VirtualBox an extremely feature rich, high performance product for enterprise customers... Read more
RoboForm 8.6.8 - Password manager; syncs...
RoboForm is a password manager that offers one-click login, mobile syncing, easy form filling, and reliable security. Password Manager. RoboForm remembers your passwords so you don't have to! Just... Read more
Postbox 7.0.11 - Powerful and flexible e...
Postbox is a new email application that helps you organize your work life and get stuff done. It has all the elegance and simplicity of Apple Mail, but with more power and flexibility to manage even... Read more
calibre 4.9.0 - Complete e-book library...
Calibre is a complete e-book library manager. Organize your collection, convert your books to multiple formats, and sync with all of your devices. Let Calibre be your multi-tasking digital librarian... Read more
Notability 4.2 - Note-taking and annotat...
Notability is a powerful note-taker to annotate documents, sketch ideas, record lectures, take notes and more. It combines, typing, handwriting, audio recording, and photos so you can create notes... Read more
FoldersSynchronizer 5.0.1 - Synchronize...
FoldersSynchronizer is a popular and useful utility that synchronizes and backs-up files, folders, disks and boot disks. On each session you can apply special options like Timers, Multiple Folders,... Read more
Sketch 62 - Design app for UX/UI for iOS...
Sketch is an innovative and fresh look at vector drawing. Its intentionally minimalist design is based upon a drawing space of unlimited size and layers, free of palettes, panels, menus, windows, and... Read more

Latest Forum Discussions

See All

The Alliance update to Out There: Omega...
Out There is an old go-to recommendation for a lot of mobile stalwarts, but I could never really get into it. This sci-fi survival game that blended elements of interactive fiction and roguelike mechanics just felt a little off-balance and a little... | Read more »
Animal Fury Destination is an action-adv...
Animal Fury Destination is an action-adventure game from independent, Colombian developer Ignicion Games. It's a 3D action game where you'll play as various different characters as you embark on a quest to stop an evil crow sorcerer. [Read more] | Read more »
Shadowgun War Games Closed Beta Impressi...
Shadowgun: War Games is an upcoming free-to-play multiplayer shooter that’s essentially just an Overwatch knock-off. There are hero characters with special abilities, and you compete in 5-v-5 game modes where the goal is to use superior team... | Read more »
Slingsters is a physics-based puzzler fo...
Slingsters is a physics-based puzzle game where the aim is to collect various different monsters by flinging them from one side of a level to the other and into a box. It's also the first game from Nappy Cat and is available now for iOS and... | Read more »
Spiritwish's latest update sees the...
A sizeable update has hit Nexon's MMORPG Spiritwish today. It brings a new game mode, characters and there will also be a special event to celebrate the update with a firework display. [Read more] | Read more »
Maze Machina, a turn-based puzzler from...
The latest game from Arnold Rauers also known as Tiny Touch Tales is now available. You may be familiar with one of his many excellent titles such as Card Crawl, Enyo and Card Thief. His latest endeavour is called Maze Machina and you can grab it... | Read more »
Mario Kart Tour's Ice Tour races to...
Can you believe Mario Kart Tour is already on its 9th tour? The game only launched back in September, and since then it's become increasingly tricky to keep on top of the amount of new content Nintendo is pumping out. [Read more] | Read more »
Apple Arcade: Ranked - Top 50 [Updated 1...
In case you missed it, I am on a quest to rank every Apple Arcade game there is. [Read more] | Read more »
Marvel Future Fight's latest update...
Marvel Future Fight's latest update has added an all-new team of heroes to recruit and do battle with. The 'Warriors of the Sky' include Blue Dragon, War Tiger, Sun Bird, and Shadow Shell. As is the norm, each character comes with their own unique... | Read more »
Klee: Spacetime Cleaners is a fast-paced...
Klee: Spacetime Cleaners is a fast-paced auto-shooter that sports a cute retro aesthetic thathad racked up an impressive 100,000 pre-registers prior to its release. It's available now for both iOS and Android. [Read more] | Read more »

Price Scanner via MacPrices.net

Just in! Apple iMacs on sale for $100-$150 of...
B&H Photo has new 2019 21″ and 27″ 5K iMacs on stock today and on sale for up to $150 off Apple’s MSRP, with prices starting at only $999. These are the same iMacs sold by Apple in their retail... Read more
Save $100 on the 13″ 1.4GHz MacBook Pro at th...
Apple resellers have 13″ 1.4GHz MacBook Pros on sale today for $100 off Apple’s MSRP, and some are including free overnight delivery: (1) Amazon has new 2019 13″ 1.4GHz MacBook Pros on sale for $100... Read more
AT&T offers free 64GB Apple iPhone XS wit...
Open a new line of service with AT&T, and they will include a free 64GB iPhone XS. Credit for the phone is applied monthly over a 30 month lease. The fine print: “Limited Time Requires new line... Read more
New Verizon deal: Apple iPhone XR for $300 of...
Switch to Verizon and sign up with one of their Unlimited plans, and Verizon will take $300 off the price of an Apple iPhone XR (regularly $749), plus get a free $200 prepaid Mastercard. This is an... Read more
Amazon’s popular AirPods sale is back with mo...
Amazon has new 2019 Apple AirPods on sale today ranging up to $40 off MSRP, starting at $129, as part of their popular Apple AirPods sale. Shipping is free: – AirPods Pro: $234.98 $15 off MSRP –... Read more
Apple’s top of the line 10.5″ 256GB WiFi + Ce...
B&H Photo has the top of the line 10.5″ 256GB WiFi + Cellular iPad Air on sale for $599 shipped. That’s $180 off Apple’s MSRP for this model and the cheapest price available. Overnight shipping... Read more
Apple’s refurbished iPad Pros are the cheapes...
Apple has Certified Refurbished 11″ iPad Pros available on their online store for up to $220 off the cost of new models. Prices start at $679. Each iPad comes with a standard Apple one-year warranty... Read more
Just in: Take $100 off the price of the 3.0GH...
Apple resellers are offering new 2018 6-Core Mac minis for $100 off Apple’s MSRP today, only $999. B&H Photo has 6-Core Mac minis on sale for $100 off Apple’s standard MSRP. Overnight shipping is... Read more
Apple has 4-core and 6-core 2018 Mac minis av...
Apple has Certified Refurbished 2018 Mac minis available on their online store for $120-$170 off the cost of new models. Each mini comes with a new outer case plus a standard Apple one-year warranty... Read more
Amazon offers $200 discount on 13″ MacBook Ai...
Amazon has new 2019 13″ MacBook Airs with 256GB SSDs on sale for $200 off Apple’s MSRP, now only $1099, each including free shipping. Be sure to select Amazon as the seller during checkout, rather... Read more

Jobs Board

*Apple* Mobility Pro - Best Buy (United Stat...
**744429BR** **Job Title:** Apple Mobility Pro **Job Category:** Store Associates **Store NUmber or Department:** 000574-Garner-Store **Job Description:** At Best Read more
Geek Squad *Apple* Consultation Professiona...
**757963BR** **Job Title:** Geek Squad Apple Consultation Professional **Job Category:** Store Associates **Store NUmber or Department:** 000433-Henrietta-Store Read more
*Apple* Computing Professional - Best Buy (U...
**754611BR** **Job Title:** Apple Computing Professional **Job Category:** Store Associates **Store NUmber or Department:** 000142-Milpitas-Store **Job Read more
Best Buy *Apple* Computing Master - Best Bu...
**745058BR** **Job Title:** Best Buy Apple Computing Master **Job Category:** Store Associates **Store NUmber or Department:** 001080-Lake Charles-Store **Job Read more
Geek Squad *Apple* Consultation Professiona...
**756640BR** **Job Title:** Geek Squad Apple Consultation Professional **Job Category:** Store Associates **Store NUmber or Department:** 000484-Manchester-Store Read more
All contents are Copyright 1984-2011 by Xplain Corporation. All rights reserved. Theme designed by Icreon.