Main navigation | Main content

HOME » PROGRAMS/ACTIVITIES » Annual Thematic Program

PROGRAMS/ACTIVITIES

Annual Thematic Program »Postdoctoral Fellowships »Hot Topics and Special »Public Lectures »New Directions »PI Programs »Industrial Programs »Seminars »Be an Organizer »Annual »Hot Topics »PI Summer »PI Conference »Applying to Participate »

Lab 2: Gaussian Elimination

Big news. Duane Dibbert of Portland Group and Kumsup Lee at the IMA have figured out the problem with our installation. It was caused by two different versions of the OS (Irix) on different machines here. The fix is described in Step 1 below.

During the lab, you can contact me by email (schreibe), phone (4-3829) or coming to my office (510 Vincent).

Step 1: Login to i7.ima.umn.edu

We are going to do our HPF work in private subdirectories of /share/tmp. PGHPF works correctly when you compile there. To do this, type:

cd /share/tmp mkdir (your userid here. For me it is schreibe) cd (your userid)

You are now in the correct directory (verify by typing pwd to see what directory you are in.) HPF will work properly there.

Step 2: You are going to write an HPF code, using a data parallel algorithm, to compute the LU decomposition of a square nonsingular matrix.

There is a sequential algorithm (written in Fortran 95). You should copy it from

` /share/tmp/schreibe/lu.hpf `

You should compile and test it with the pghpf compiler. With input n = 4, it should print this:

Running on 1 processors. 10.00 1.00 1.00 1.00 0.10 9.90 0.90 0.90 0.10 0.09 9.82 0.82 0.10 0.09 0.08 9.75

For computer scientists: The performance of this code depends on the size of the matrix in quite interesting, complicated ways. Try to run it for n = 300, 301, 302, ..... Do the times behave in a predictable way? Any theory as to what is going on? (Think about cache.)

Now comes the fun part. Make this code data parallel. Your resulting code should have

You'll want to know about the following Fortran intrinsic functions:

SPREAD(X, DIM, NC)

which makes an n+1 dimensional array out of NC copies of an n dimensional array X. Those copies constitute dimension DIM of the result. For example, if X is the one-dimensional array [1 2 3 4 5], then

SPREAD(X, DIM = 1, NCOPIES = 4)

has the value

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5

and

SPREAD(X, DIM = 2, NCOPIES = 3)

has the value

1 1 1 2 2 2 3 3 3 4 4 4 5 5 5

The second useful intrinsic is

MAXLOC(X)which returns the location of the largest element of X, assuming X is one dimensional. Actually, it will return an integer array of shape (1). Use it as follows:

integer locbig(1) locbig = maxloc(abs(a(k:n,k))) imax = k - 1 + locbig(1)Now imax is the row index of the largest element of the kth column, ignoring elements 1 through k-1.

If you are stumped, look at the examples in /share/tmp/schreibe/lu_spread.hpf and /share/tmp/schreibe/lu_forall.hpf.

To get the best performance, what data distribution should you use? How does the choice of data distribution affect the communication required in the algorithm? Can you use the profiling capability of pghpf to find the expensive parts of the algorithm? (See the online users guide.)

You can also try to parallelize the code using the

```
!hpf$
independent
```

directive. Does this work? Faster or slower?
(An example solution is in the file /share/tmp/schreibe/lu_independent.hpf)
- Copy the file.
- Compile it, test it, time it.
- Understand and rewrite the program using data parallel array operations.
- Rewrite and time it.
- Do the same using independent DO loops.

Online
documentation:

You can get online documentation that is set up at the parallel
computer center at
BU

Other web sites that may be useful to you are:
the Portland Group

pghpf bugs list
and

the HPF home
page