The Fast Fourier Transform (FFT) plays a key role in many areas of computational science and engineering. The 1-dimensional discrete FFT takes as input a vector of N data points, often representing a function's behavior at evenly spaced intervals in time or space, and produces as output another vector of N coefficients in the Fourier series expansion of the input function.
Previous ``out-of-core'' FFT implementations have been restricted to multidimensional FFTs in which each 1-dimensional FFT fits in memory or to parallel disk systems in which all parallel disk accesses must be fully striped (as in RAID 3).
In this talk, I will discuss how to perform 1-dimensional FFTs when the data reside on a parallel disk system with independent disk accesses. Like fully striped accesses, data are striped across multiple disks and each parallel disk access reads or writes one logical block per disk. With independent accesses, however, the locations of the blocks accessed may differ among the disks.
I will present both analytical and experimental results for performing out-of-core FFTs in two ways: using traditional virtual memory with demand paging, and using a provably asymptotically optimal algorithm for the Parallel Disk Model (PDM) of Vitter and Shriver.
When using virtual memory with demand paging, FFT performance is extremely poor once the problem size exceeds the memory size. In particular, almost every data access during the initial bit-reversal permutation induces a page fault. Even using a parallel disk system for virtual memory swap space does not appreciably increase performance, because access latency remains high.
The FFT algorithm for the PDM is a variant of the one proposed by Vitter and Shriver. It uses the BMMC permutation algorithm of Cormen, Sundquist, and Wisniewski as a key subroutine to perform the initial bit-reversal permutation and to permute data between stages of ``mini-FFTs'' that each fit in memory. The portable implementation emulates the PDM via the ViC* application programmer interface. I will give performance results for a uniprocessor with a large memory and multiple disks.
This is joint work with David Nicol
Back to Workshop Schedule