OS/2 e-Zine! - The REXX Files

The REXX Files - by Dr. Dirk Terrell

ast month we looked at some simple sorting methods. Now let's look at one that is much more efficient and pretty robust: the heapsort. Heapsort is much more efficient than the bubble sort or the insertion sort, which you will recall from last month, are N-squared algorithms meaning that they require of order NxN comparisons to sort N items. Heapsort is an Nlog2(N) algorithm where log2(N) means the base-2 logarithm.

(For those that cringe at the faintest whiff of anything mathematical, don't let that fancy-sounding talk bother you. A logarithm is simply an exponent, the number written as a superscript on another number. The number on which the exponent is placed is called the base, so base-2 means that we are putting exponents on the number 2. Since 8 can be written as "2 to the 3rd power", meaning 2x2x2, the base-2 logarithm of 8 is 3.)

To illustrate how much of a difference there is between N-squared and Nlog2(N) algorithms, I've tabulated the number of comparisons required for each case in the following table:

N	NxN	Nlog2(N) (rounded)
10	100	14
100	10,000	665
1,000	1,000,000	9,966
1,000,000	1,000,000,000,000	13,815,511

As you can see, for large numbers of items, simple N-squared methods are just too inefficient to do the job. Sorting 100 items by an N-squared method will require, on average, about the same computational effort as sorting 1,000 items by an Nlog2(N) method. And the disparity grows rapidly as the number of items gets larger.

Heapsort is perhaps not as well known as Quicksort, which is another Nlog2(N) algorithm that in most cases is the fastest known algorithm. However, Heapsort tends to be more robust. There are (admittedly rare) cases where Quicksort can take as long as an N-squared method, but Heapsort doesn't suffer quite as much in its worst-case scenario for input data. Another advantage of Heapsort is that it is a so-called in-place algorithm meaning that it doesn't require any extra memory other than that required to hold the list of numbers, making it quite handy when sorting lists with large numbers of items. I tend to prefer Heapsort in general because of its robust nature and because it is simpler to code than Quicksort.

The name Heapsort comes from "heap" which is a set of N numbers (let's call the set "a") defined in a specific way mathematically. Since HTML is not at present very suitable for mathematical notation, I'll do it as an image:

Don't click that "next article" button just yet. It's actually not as bad as it looks. Let's look at a heap graphically, and you'll see what that terse mathematical definition is saying. Suppose we have 15 numbers in our set. If we arrange them as a binary tree, starting with one element branching to two others, those two each branching to two more, and so on, a heap is pretty easy to visualize. I'll illustrate it as a table with the first element in row one branching to two elements in row two, which branch into four elements in row 3 and so on:

a1
a2				a3
a4		a5		a6		a7
a8	a9	a10	a11	a12	a13	a14	a15

Now, what that formula above is saying is that if you start with a number somewhere in the table and move down the table, following the branching, the numbers will be getting smaller. That is, numbers on top of others are always greater than or equal to numbers below them. Note that there is not any ordering among the numbers in a row on the table.

So, you can begin to see how sorting is done using this algorithm. If you arrange your numbers into a heap, then you know that the topmost number is the largest. You remove it from the top and move the larger of the branches below it to the top position. Then you move the larger of the two below that one up, and so on until you reach the bottom. Then you repeat the whole process until you have only one number left, which will of course be the smallest.

The trick to Heapsort is getting started, i.e. how to take our jumbled list of numbers and convert it into a heap. It's really pretty easy. Take half of your numbers and let these be the bottom of the heap. Now take numbers from the remaining half and assign them to the next row of the heap. Compare the number to the two below it. If it is smaller, then swap it with the larger of the two. Once the second row is complete, move up one more row and do it all over again. This continues until you have gone through all of the numbers. Once you have gone through all of the numbers, they will be in the form of a heap.

Below is code to implement the Heapsort method. It uses a REXX procedure that I called DownHeap and that procedure is the one that does the work of shifting items around in the heap so that it is in the proper form. It is noticeably more complex than the code we wrote last month, but a little experimenting will show that it is significantly faster. I did a test with 500 numbers and the results were as follows:

Bubble sort took 5.65 seconds.
Insertion sort took 3.29 seconds.
Modified insertion sort took 1.86 seconds.
Heapsort took 0.33 seconds

Source code from heapsort.cmd:

 /* ------------------------------------------------------------------ */
 /* function: heap sort routine                                        */
 /*                                                                    */
 /* call:     HeapSort                                                 */
 /*                                                                    */
 /* returns:  Elapsed time for sort in seconds                         */
 /*                                                                    */
 /* notes:    You must save the elements to sort in the stem "STEM."   */
 /*           stem.0 must contain the number of elements in the stem.  */
 /*                                                                    */
 /* reference: Sedgewick, "Algorithms", 2nd ed., Chapter 11            */
 /*                                                                    */
 Heapsort: PROCEDURE expose stem.
   start=Time("R")

   M = stem.0
   N = M
 
   do k=M % 2 to 1 by -1
     call DownHeap k N
   end /* do */
 
   do while N>1
     t = stem.1
     stem.1 = stem.n
     stem.n = t
     n = n-1
     call DownHeap 1 N
   end /* do */
   end=time("R")
   elapsed=end-start
 RETURN elapsed
 
 /* subroutine of HeapSort                                             */
 DownHeap: PROCEDURE expose stem.
   parse Arg k N
 
   v = stem.k
 
   do while k <= N%2
     j = k+k
     if j < n then
     do
       i = j+1
       if stem.j < stem.i then
         j=j+1
     end  /* do */
 
     if v >= stem.j then
       signal label
 
     stem.k = stem.j
     k = j
   end /* do */
 
 Label:
   stem.k = v
 RETURN

The source code file this month includes a "driver" routine that generates a set of random numbers and then calls the Heapsort routine to sort them.

Dr. Dirk Terrell is an astronomer at the University of Florida specializing in interacting binary stars. His hobbies include cave diving, martial arts, painting and writing OS/2 software such as HTML Wizard.

[Our Sponsor: Indelible Blue - OS/2 software and hardware solutions to customers worldwide.]