The main goal of this assignment is to convince you of the ease of use of the data parallel programming model. For those problems that lend themselves to this model, programming a parallel machine can be easier than programming serial machines. Of course, data parallel languages can execute on serial machines, but parallel computers have promoted their use.
Step 1: Let's get used to the SGI power challenge. At the IMA this is a four processor, shared memory machine. Login to i7.ima.umn.edu. Try to compile and run a "hello world" HPF program:
print *, 'Running on ', number_of_processors(), ' processors.'
The compiler is
and I use the following in my .cshrc
alias hpf pghpf -Mfree -Mextend -Minline -Mnosequence
which allows me to use freeform input (no special columns), long lines, and causes the compiler to do some inlining. You can also use the -O2 or (if feeling very brave) the -O3 flags to get a more aggressively optimized executable program. I also put
setenv MANPATH /usr/pgi/man
in my environment; this allows me to use
to get the man page.
Step 2: We will be writing this code using High Peformance Fortran. You can get online documentation that is set up at the parallel computer center at BU but you dont need to do the stuff with your .cshrc file that they recommend. Other web sites that may be useful to you are: the Portland Group
pghpf bugs list and
the HPF home page Next step is grabbing the code:
cp ~schreibe/HPF/resistors.hpf .
You'll obtain an HPF program that computes the effective resistance and voltages of a grid of resistors. Compile with the pghpf command. Read the online users guide to find out how to compie and run on one or more processors. See Chapter 3 specifically. The IMA machine has four processors, so there is no point in trying more than that. As far as I can tell, you need to set the number of processors before you compile, by
setenv PGHPF_OPTS '-np 4'
The problem is to compute the voltages and the effective resistance of a 2n+3 by 2n+2 grid of 1 ohm resistors if a battery is connected to the two center points. This is a discrete version of finding the lines of force using iron filings for a magnet. We force a net current on one amp, calculate the voltage at every point in the grid (assuming the edges of the grid are connected to ground) and find the effective resistance as the voltage drop across the terminals of the battery. The equations are just conservation of charge -- Kirchoff's laws.
The method of solution that we will use here is successive overrelaxation (SOR) with red-black ordering. This is certainly not the fastest way to solve the problem, but it does illustrate many important programming ideas.
The nodes are divided in half into red nodes and black nodes. During the first pass, the red nodes obtain the voltages as a weighted average of their original voltage, the input (if any) and the four surrounding black nodes. During the second pass, the black nodes obtain voltages from the four surrounding red nodes. In the infinite limit, the process converges to the correct answer.
Compile and run the program. Then use the Fortran intrinsic SYSTEM_CLOCK to time the program (insert the calls to system clock and print the execution time.) Do you get any speedup with additional processors? How is runtime influenced by the grid size? First assignment. Implement the three dimensional version. What is the effective resistance in N dimensions? (Any guesses?) Try changing the distribution. Try rewriting the program to use FORALL instead of EOSHIFT for the finite difference stencil computation. Does it work? Faster or slower?
this gets harder. In our program we use the where(red)
statement. Though you might have imagined that the compiler
is smart enough to work in time proportional to the active part
(i.e. the "true" part) of the statement, unfortunately
it is not. In fact, the time spent is proportional to the entire
size and even more time for conditioning.
We would like you to improve the program by putting all the red nodes into one array called red and all the black nodes in the same size array black.
By setting the edge of the voltage array at every iteration to be equal to the neighboring values, you would implement a no-current (i.e. a Neumann) boundary condition. You could try this. Or, you could try periodic boundary conditions. For these, use the CSHIFT intrinsic instead of EOSHIFT.