Thinking parallel: sparse iterative solvers with CUDA
Tuesday, January 11, 2011 - 8:30am - 9:30am
Iterative sparse linear solvers are a critical component of a scientific computing platform. Developing effective preconditioning strategies is the main challenge in developing iterative sparse solvers on massively parallel systems. As computing systems become increasingly power-constrained, memory hierarchies for massively parallel systems will become deeper and more hierarchical. Parallel algorithms with all-to-all communication patterns that assume uniform memory access times will be inefficient on these systems. In this talk, I will outline the challenges of developing good parallel preconditioners, and demonstrate that domain decomposition methods have communication patterns that match emerging parallel platforms. I will present recent work to develop restricted additive Schwarz (RAS) preconditioners as part of the open source 'cusp' library of sparse parallel algorithms. On 2d Poisson problems, a RAS preconditioner is consistently faster than diagonal preconditioning in time-to-solution. Detailed analysis demonstrates that the communication pattern of RAS matches the on-chip bandwidths of a Fermi GPU. Line smoothing, which requires solving a large number of small tridiagonal linears systems in local memory, is another preconditioning approach with similar communication patterns. I will conclude with a roadmap for devoping a range of preconditioners, smoothers, and linear solvers on massively parallel hardware based on the domain decomposition and line smoothing approaches.