FMM on multicore processors
method (FMM) for multicore processors using POSIX threads.
Short-range interactions are straight forward to parallelize. We invoke
multiple threads per compute core to alleviate partition and load
imbalances. For the calculation of long-range interactions, we assign
the multipole subtrees below a certain level to compute threads with
affinity settings that conform to the interaction lists of the tree
nodes and exploit the memory hierarchy.
On a Sun SunFire X4600 with 8 AMD Opteron 885 processors (16 cores)
running at 2.6 GHz clock rate and 64 GB of memory, we observe a better
than 15x speedup compared to the sequential version of FMM-Yukawa for
two sample benchmark problems of 10 to 100 million charges uniformly
distributed inside a unit box or on the surface of a unit sphere and
require six-digit accuracy.