Several of the top ranked supercomputers are based on the hybrid architecture consisting of a large number of CPUs and GPUs. High performance has been obtained for problems with special structures, such as FFT-based imaging processing or N-body based particle calculations. However, for the class of problems described by partial differential equations (PDEs) discretized by finite difference (or other mesh based methods such as finite element) methods, obtaining even reasonably good performance on a CPU/GPU cluster is still a challenge. In this paper, we propose and test an hybrid algorithm which matches the architecture of the cluster. The scalability of the approach is implemented by a domain decomposition method, and the GPU performance is realized by using a smoothed aggregation based algebraic multigrid method. Incomplete factorization, which performs beautifully on CPU but poorly on GPU, is completely avoided in the approach. Numerical experiments are carried out by using up to 32 CPU/GPUs for solving PDE problems discretized by FDM with up to 32 millions unknowns.
LUO Li, YANG Chao, ZHAO Yu-bo, CAI Xiao-chuan. A Scalable Hybrid Algorithm for Solving Partial Differential Equations on a Cluster of CPU/GPU[J]. Journal of Integration Technology,2012,1(1):84-88Copy