NSort
Volume Number: | | 9
|
Issue Number: | | 5
|
Column Tag: | | Visual programming
|
NSort
An application of parallelism to increase algorithm speed
By Mark Kauffman, Chico, California
The Future of Programming
I'll never forget my first Christmas after I started back to school as a Computer Science major. I was visiting my in-laws. They had the Orkin Man over to spray for termites. He and I started talking and I told him that I was working on my Bachelors degree in Computer Science.
"Oh, really," he said, "I got my degree in Computer Science in 1962. I used to program for IBM using punched cards."
I was shocked that anyone one with a Computer Science degree would become the "Orkin Man" and swore that it would never happen to me. Things change fast in the technical world. You have to keep your eyes open and stay on your feet if you don't want to be left in the dust. If you are a programmer, make sure that you don't become the "Orkin Man" by staying on top of the latest advances in programming.
Look into the future and imagine what kind of computers we are going to be programming in the year 2000. We are approaching the physical limits of how small and fast we can build a single processor. How can we build faster machines when we reach the speed limits, imposed by physics, of our current hardware? One method of building a computer system that is faster than a system with an individual processor is to build a system with multiple processors and run them in parallel. Parallel processing systems are not wishful thinking. In their book Highly Parallel Computing, Almasi and Gottlieb list over thirty existing different parallel processing architectures (2). Are you ready to start programming these parallel systems? How many languages do you know that can take advantage of a system with more than one processor? We have languages that do just that, available on the Macintosh. One of those languages is Occam, a textual non-object-oriented language that requires the programmer to specify which functions may operate concurrently. Lisp and C have been modified to design parallel processing algorithms using the Paralation Model (Snively 72). Another language, based on the dataflow model of computing and available on the Macintosh, is TGS Systems' Prograph.
Dataflow Architectures & Dataflow Languages
Most of the articles, and even the advertising, I've read about Prograph emphasize that it is a graphical object-oriented language. I was surprised to find that Prograph is also a dataflow language. It is based on a parallel-processor dataflow architecture. Prograph, as a dataflow language, differs from languages like C and Pascal in that it supports concurrent instruction execution at the single program instruction level. Pascal, C, LISP, FORTRAN, BASIC, and other common computer languages are based on the von Neumann architecture: sequential instruction execution modifying the data in memory. Only one instruction can happen at a time.
In the dataflow model of computing, data flows into and out of instructions through input/output terminals. Each instruction may have one or more input and output terminals. An instruction executes whenever all of the data required at its input terminal(s) are available. After execution, the instruction places the result(s) on its output terminal(s). Every instruction with an available processor and data at its inputs will execute concurrently in a dataflow system. Parallelism is implicit in the design of the program. No extra record keeping is required by the programmer. Prograph programs are graphic illustrations of algorithms using the dataflow computing model. Though not yet running on a parallel-processor platform, Prograph is positioned to take advantage of such a system. For the rest of this article let's look at a common computing problem that can be solved much more quickly with an algorithm designed to make use of a parallel-processor architecture, sorting.
Analyzing Algorithm Execution Time
First, let's examine how to determine how fast an algorithm is. Then we will be able to compare NSort, a parallel-processor based sorting algorithm, with the sort algorithms in common use today. To determine how fast an algorithm is, we first find the critical steps in the algorithm. These are steps which are repeated a number of times depending on the number of data to be dealt with, N. Then we count how many times the critical steps must be performed for the algorithm to complete. For example, if our sort algorithm had to compare each number to be sorted with every other number to be sorted, the comparisons would be the critical steps. The number of times the algorithm would have to make these comparisons to complete the sort would be N * N. We would say that our sort ran in time N squared. Increasing the number of data to be sorted from 10 to 1000, a factor of 100, would increase the sort time by a factor of 10000. You've heard of quicksort? What's great about quicksort is that on the average it will sort a large list in time N * Log(base 2)N. With quicksort then, increasing the number of data from 8 to 1024, a factor of 128, would, on the average, increase the sort time by a factor of under 2000. All other sorts that are based on sequential processing also sort in some time which is a multiple of N. In his text, Programming With Data Structures 2, Robert L. Kruse proves that "Any algorithm that sorts a list by comparing keys must, in it's worst case on a list of length N, do at least. . . N*lg(N) + O(N) comparisons of keys." In the next section I will describe an algorithm, NSort, that will consistently sort in time N.
The NSort Algorithm
We humans have the capability to make more than one comparison at a time, so I felt that a reasonable way to develop a sort algorithm that uses parallelism would be to observe human parallelism at work. To develop NSort, I wrote a list of numbers on a dry-erase board and sorted them as quickly as I could. As I sorted, I watched what I did. Here is the list and the procedure I took:
Unsorted List
(7 2 3 9 5 4 6)
Take the first element and make a new list. Now you have two lists: the sorted list, and the unsorted list:
Unsorted List Sorted List
(2 3 9 5 4 6) (7)
Take the first element from the unsorted list again. Where does it belong in the sorted list? Put it there.
Unsorted List Sorted List
(3 9 5 4 6) (2 7)
Keep taking the first element from the unsorted list and placing it where it belongs in the sorted list, until the unsorted list is empty.
Unsorted List Sorted List
(9 5 4 6) (2 3 7)
(5 4 6) (2 3 7 9)
(4 6) (2 3 5 7 9)
(6) (2 3 4 5 7 9)
() (2 3 4 5 6 7 9)
We've sorted the list in the same number of steps as there are elements to be sorted, time N. The critical step, that you were able to perform as a parallel operation, was "Where does it (the number detached from the unsorted list) belong in the sorted list?" A computer system with processors running in parallel could do the same thing. You would pass the processors a sorted list of numbers, one list element to each processor. Also pass each processor the single number from the unsorted list to compare with the sorted list. Ask each processor if its list element is less than the number. The collection of processors would then return a list of booleans. Where the booleans changed from TRUE to FALSE is where the number being compared to the elements in the list should be inserted in the sorted list. Let me use the numbers from the above example to demonstrate how this works:
Unsorted List Sorted List
(6) (2 3 4 5 7 9)
You want the processor array to find out where the six belongs in the sorted list. Pass the sorted list to the processor array. Ask the processors in the array if their list element is less than six. The processor array returns the following list:
(TRUE TRUE TRUE TRUE FALSE FALSE)
Now to figure out which element to insert the six in front of, pass this boolean array to the parallel-processor array. Ask, "Who has a FALSE?" The processor array returns the list of processor numbers that are holding FALSE:
(5 6)
From the list of processor numbers returned, only look at the value of the lowest numbered processor. In this example the processor array finds that the six needs to be inserted before the fifth element on the list.
There is one special case to consider. What happens if the number from the unsorted list is bigger than all of the numbers in the sorted list? Say we had used a ten instead of a six in this example. Then the boolean array created by the compare operation would have looked like this:
(TRUE TRUE TRUE TRUE TRUE TRUE)
When you ask the processor array who is holding a false, an empty list is returned. Then you know that the ten has to be inserted after the last item on the sorted list.
Implementation of NSort
Following are the seven prographs (TGS Systems' term for algorithms written in Prograph) of the NSort implementation. For those unfamiliar with Prograph syntax, there is one function per window. Function names are given in the title bars of the windows. As we look at each function, I'll explain its purpose and operation.
First, let's examine the main NSort routine.
This function asks the user to enter a list then tests the input to see that it is a list. If the input is a list it sorts the list and shows the results. Otherwise, the function terminates without showing any results.
The called functions, ask and show are Prograph primitives. They are built into the language. Ask brings up a dialog box requesting input and show brings up a dialog box that displays output. Both primitives' dialog boxes have OK buttons for the user to press. The called function test is not a primitive but is a function supplied by TGS with their Algorithm examples. It checks the data on its input terminal to see that it is a list and that all of the elements of the list are of the same type. Notice the X in a box with the bar above it next to test. This is the Prograph control "Terminate on Failure." If test fails, this control terminates the operation of the function NSort. When test succeeds, nSort sorts the list on its input terminal then passes the result to show.
Now, let's look at nSort. Notice that it doesn't actually do all of the sorting but sets up the sort and calls sort-em to complete the sort.
The nSort method consists of two cases. Case 1:2 (read 1 of 2) performs a Prograph match operation on the input list. The match operation is the line above the set of parenthesis and box with an X inside. This match operation is checking for an empty list, (). If the list is not empty, the match fails and the next case is called. An empty list is not sorted.
Case 2:2 performs the same first step we took when sorting. It takes the left most element off the unsorted list and creates a one element 'sorted' list. These two lists are passed to sort-em which takes items from the unsorted list and inserts them in the right place in the sorted list until the unsorted list is empty. The looping arrows show that both lists are being feed from the output terminals of sort-em back into its input terminals. The looping arrows also represent repeated operations on data by a method.
The first thing sort-em does is another match on the input list. If the match succeeds, the list is empty and the looping around sort-em terminates. When the unsorted list is not empty, sort-em detaches the left most item from the list, and passes the item and the sorted list to find insert location. After determining where to insert the item, it passes the item and the sorted list to the method insert into sorted list to be, you guessed it, inserted into the sorted list.
Now, let's review the operation of find insert location.
This method, find insert location, figures out where to insert the item in the sorted list. It returns a 0 if the item is to be inserted to the far right of the sorted list. Otherwise it returns the location where the item is to be inserted.
This operation is the key method to sorting in time N. If Prograph were running on a parallel processor system and there were enough processors to perform the list operations simultaneously, NSort would complete in time N.
The only operations left to consider are the two cases of insert into sorted list.
If the insert location is zero, case 1:2 places the item to insert on the far right of the sorted list. If the insert location is not zero, case 1:2 switches to case 2:2. Case 2:2 splits the sorted list in two at the place where the item is to be inserted, converts the item to insert into a list, and then joins all three lists to create the new sorted list. This completes the implementation of NSort with Prograph.
The Challenge
Our challenge as programmers is to learn what processes will benifit from parallelism and how to implement them. Searching and finding the shortest path look like good candidates. Neural network design, another hot topic in computer science today, is another. Using a graphical, parallel language like Prograph should simplifly network design. You would define a single class neuron. You could draw the network topology using Prograph's existing graphical capabilities.
There is a challenge for you hardware gurus too. Build (inexpensive please) NuBus boards with parallel processing capability. It would be nice if they would just plug in and work with a graphical object-oriented language. I've seen that Occam is available on the Macintosh with parallel processing hardware. (Occam is a non-object-oriented parallel-processing textual language that requires the programmer to explicitly state which functions can operate in parallel.) What would it take to convert that hardware for use with Prograph?
If you implement one of these ideas or if you think of others, I'd like to hear about it. Send me e-mail at mkaufman@hairball.ecst.csuchico.edu or write to me at: 254 E. 7th Ave. Chico, CA 95926. Better yet, write an article for MacTech.
References
Almasi, George and Gottlieb, Allan. Highly Parallel Computing. Redwood City: Benjamin/Cummings Publishing Company, Inc., 1989.
Snively, Paul. "The Paralation Model." MacTutor. Nov./Dec. 1992: 72-79.