Talk abstract:
Performing Out-of-Core FFTs on Parallel Disk Systems
Thomas H. Cormen, Dartmouth University
The Fast Fourier Transform (FFT) plays a key role in many
areas of computational science and engineering. The 1-dimensional
discrete FFT takes as input a vector of N data points,
often representing a function's behavior at evenly spaced intervals
in time or space, and produces as output another vector of N
coefficients in the Fourier series expansion of the input function.
Previous ``out-of-core'' FFT implementations have been restricted
to multidimensional FFTs in which each 1-dimensional FFT fits
in memory or to parallel disk systems in which all parallel
disk accesses must be fully striped (as in RAID 3).
In this talk, I will discuss how to perform 1-dimensional
FFTs when the data reside on a parallel disk system with independent
disk accesses. Like fully striped accesses, data are striped
across multiple disks and each parallel disk access reads or
writes one logical block per disk. With independent accesses,
however, the locations of the blocks accessed may differ among
the disks.
I will present both analytical and experimental results for
performing out-of-core FFTs in two ways: using traditional virtual
memory with demand paging, and using a provably asymptotically
optimal algorithm for the Parallel Disk Model (PDM) of Vitter
and Shriver.
When using virtual memory with demand paging, FFT performance
is extremely poor once the problem size exceeds the memory size.
In particular, almost every data access during the initial bit-reversal
permutation induces a page fault. Even using a parallel disk
system for virtual memory swap space does not appreciably increase
performance, because access latency remains high.
The FFT algorithm for the PDM is a variant of the one proposed
by Vitter and Shriver. It uses the BMMC permutation algorithm
of Cormen, Sundquist, and Wisniewski as a key subroutine to
perform the initial bit-reversal permutation and to permute
data between stages of ``mini-FFTs'' that each fit in memory.
The portable implementation emulates the PDM via the ViC* application
programmer interface. I will give performance results for a
uniprocessor with a large memory and multiple disks.
This is joint work with David Nicol
Back to Workshop Schedule
|