Lab
2: Gaussian Elimination
Big news. Duane Dibbert of Portland Group and Kumsup Lee at the
IMA have figured out the problem with our installation. It was
caused by two different versions of the OS (Irix) on different
machines here. The fix is described in Step 1 below.
During the lab, you can contact me by email (schreibe), phone
(4-3829) or coming to my office (510 Vincent).
Step 1: Login to i7.ima.umn.edu
We are going to do our HPF work in private subdirectories of /share/tmp.
PGHPF works correctly when you compile there. To do this, type:
cd /share/tmp
mkdir (your userid here. For me it is schreibe)
cd (your userid)
Y
ou
are now in the correct directory (verify by typing pwd to see
what directory you are in.) HPF will work properly there.
Step 2: You are going to write an HPF code, using a data parallel
algorithm, to compute the LU decomposition of a square nonsingular
matrix.
There is a sequential algorithm (written in Fortran 95). You should
copy it from
/share/tmp/schreibe/lu.hpf
You
should compile and test it with the pghpf compiler. With input
n = 4, it should print this:
Running on 1 processors.
10.00 1.00 1.00 1.00
0.10 9.90 0.90 0.90
0.10 0.09 9.82 0.82
0.10 0.09 0.08 9.75
For computer scientists: The performance of this code depends
on the size of the matrix in quite interesting, complicated ways.
Try to run it for n = 300, 301, 302, ..... Do the times behave
in a predictable way? Any theory as to what is going on? (Think
about cache.)
Now comes the fun part. Make this code data parallel. Your resulting
code should have exactly one DO statement. The two singly
nested DO loops should be replaced by data parallel operations,
and the innermost two loops of the triply nested DO loop can also
be replaced. Why can't the outer loop be?
You'll want to know about the following Fortran intrinsic functions:
SPREAD(X, DIM, NC)
which
makes an n+1 dimensional array out of NC copies of an n dimensional
array X. Those copies constitute dimension DIM of the result.
For example, if X is the one-dimensional array [1 2 3 4 5], then
SPREAD(X, DIM = 1, NCOPIES = 4)
has the value
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
1 2 3 4 5
and
SPREAD(X, DIM = 2, NCOPIES = 3)
has the value
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
The
second useful intrinsic is
MAXLOC(X)
which
returns the location of the largest element of X, assuming X is
one dimensional. Actually, it will return an integer array of
shape (1). Use it as follows:
integer locbig(1)
locbig = maxloc(abs(a(k:n,k)))
imax = k - 1 + locbig(1)
Now
imax is the row index of the largest element of the kth column,
ignoring elements 1 through k-1.
If you are stumped, look at the examples in /share/tmp/schreibe/lu_spread.hpf
and /share/tmp/schreibe/lu_forall.hpf.
To get the best performance, what data distribution should you
use? How does the choice of data distribution affect the communication
required in the algorithm? Can you use the profiling capability
of pghpf to find the expensive parts of the algorithm? (See the
online users guide.)
You can also try to parallelize the code using the !hpf$
independent directive. Does this work? Faster or slower?
(An example solution is in the file /share/tmp/schreibe/lu_independent.hpf)
Assignment
Checklist
- Copy
the file.
- Compile
it, test it, time it.
- Understand
and rewrite the program using data parallel array operations.
- Rewrite
and time it.
- Do
the same using independent DO loops.
Online
documentation:
You can get online documentation that is set up at the parallel
computer center at
BU
Other web sites that may be useful to you are:
the Portland Group
pghpf bugs list
and
the HPF home
page
1996-1997
Mathematics in High Performance Computing