HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program
Lab 2: Gaussian Elimination

Big news. Duane Dibbert of Portland Group and Kumsup Lee at the IMA have figured out the problem with our installation. It was caused by two different versions of the OS (Irix) on different machines here. The fix is described in Step 1 below.

During the lab, you can contact me by email (schreibe), phone (4-3829) or coming to my office (510 Vincent).

Step 1: Login to i7.ima.umn.edu

We are going to do our HPF work in private subdirectories of /share/tmp. PGHPF works correctly when you compile there. To do this, type:

cd /share/tmp
mkdir (your userid here.  For me it is schreibe)
cd (your userid)

You are now in the correct directory (verify by typing pwd to see what directory you are in.) HPF will work properly there.

Step 2: You are going to write an HPF code, using a data parallel algorithm, to compute the LU decomposition of a square nonsingular matrix.

There is a sequential algorithm (written in Fortran 95). You should copy it from


You should compile and test it with the pghpf compiler. With input n = 4, it should print this:

 Running on            1 processors.
   10.00   1.00   1.00   1.00
    0.10   9.90   0.90   0.90
    0.10   0.09   9.82   0.82
    0.10   0.09   0.08   9.75

For computer scientists: The performance of this code depends on the size of the matrix in quite interesting, complicated ways. Try to run it for n = 300, 301, 302, ..... Do the times behave in a predictable way? Any theory as to what is going on? (Think about cache.)

Now comes the fun part. Make this code data parallel. Your resulting code should have exactly one DO statement. The two singly nested DO loops should be replaced by data parallel operations, and the innermost two loops of the triply nested DO loop can also be replaced. Why can't the outer loop be?

You'll want to know about the following Fortran intrinsic functions:


which makes an n+1 dimensional array out of NC copies of an n dimensional array X. Those copies constitute dimension DIM of the result. For example, if X is the one-dimensional array [1 2 3 4 5], then


has the value

1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5



has the value

1  1  1
2  2  2
3  3  3
4  4  4
5  5  5

The second useful intrinsic is

which returns the location of the largest element of X, assuming X is one dimensional. Actually, it will return an integer array of shape (1). Use it as follows:

integer locbig(1)

locbig = maxloc(abs(a(k:n,k)))
imax = k - 1 + locbig(1)
Now imax is the row index of the largest element of the kth column, ignoring elements 1 through k-1.

If you are stumped, look at the examples in /share/tmp/schreibe/lu_spread.hpf and /share/tmp/schreibe/lu_forall.hpf.

To get the best performance, what data distribution should you use? How does the choice of data distribution affect the communication required in the algorithm? Can you use the profiling capability of pghpf to find the expensive parts of the algorithm? (See the online users guide.)

You can also try to parallelize the code using the !hpf$ independent directive. Does this work? Faster or slower? (An example solution is in the file /share/tmp/schreibe/lu_independent.hpf)

Assignment Checklist

  • Copy the file.
  • Compile it, test it, time it.
  • Understand and rewrite the program using data parallel array operations.
  • Rewrite and time it.
  • Do the same using independent DO loops.

Online documentation:

You can get online documentation that is set up at the parallel computer center at BU
Other web sites that may be useful to you are: the Portland Group
pghpf bugs list and
the HPF home page


1996-1997 Mathematics in High Performance Computing

Connect With Us: