The lattice Boltzmann (LB) method, which is closely related to the lattice gas (LG) method, will be discussed in detail. The LG method is boolean in nature using only bits to indicate the presence or absence of a particle moving in a particular direction and speed. The absence of floating point operations gives the LG method unconditional numerical stability but restricts it to specialized hardware that can perform the Boolean logical operations efficiently. The Boolean character also results in a noisy signal that must be averaged over space/time for reliable estimates. The LB method, on the other hand, tracks the distribution functions (or time averages) of the particles. As a result, floating-point numbers have to be used and so the method is not boolean. While this makes the LB method susceptible to instabilities due to accumulation of round-off errors, it allows the LB method to use a variety of existing computer platforms. The signal-to-noise ratio of the LB method is also significantly higher than the LG method. Both the LB method and the LG method are highly parallel. In fact, the LB method optimizes extremely well on current computer platforms as will be demonstrated in this talk. The current version of the LB code developed at FRL runs at speeds around 1.7 Gflops (2D code) and 2.0 Gflops (3D code) on a 32 processor Cray T3D. The 3D code has super-linear speedup (linear being the theoretical maximum) and runs at a speed of 33 Gflops on a 512 processor T3D.
The objective of this talk is to demonstrate the potential of this method as a viable tool for performing time accurate simulations of incompressible flows. To that end, two issues will be examined in detail --- accuracy and speed. The spatial and temporal accuracy of the LB method will be established through suitable benchmark studies. It is important to note here that the method is formally second-order accurate in both space and time, an accuracy that exceeds that of many commercial codes today. The speed at which the code runs will be demonstrated through actual production runs on the parallel Cray T3D. As an example of the phenomenal performance of the LB code developed at FRL, consider the following fact. Each processor on the T3D has a read bandwidth from memory to cache that is limited to 320 Mbytes/sec, which translates to 40 Mflops since the T3D is a 64 bit machine (8 bytes/word). Codes that do not have cache reuse will be hardware limited to this speed. Our code runs at a speed of 60 Mflops on this processor through cache reuse and other optimizations.
This is joint work with Gary S. Strumolo.