罗力,杨超,赵宇波,蔡小川.CPU/GPU 集群上求解偏微分方程的可扩展混合算法[J].集成技术,2012,1(1):84-88
CPU/GPU 集群上求解偏微分方程的可扩展混合算法
A Scalable Hybrid Algorithm for Solving Partial Differential Equations on a Cluster of CPU/GPU
  
DOI:
中文关键词:  PDEs;CPU/GPU集群;区域分解;代数多重网格;可扩展算法
英文关键词:PDEs; CPU/GPU cluster; domain decomposition; algebraic multigrid; scalable algorithm
基金项目:
作者单位
罗力 中国科学院深圳先进技术研究院 深圳 518055 
杨超 中国科学院软件研究所 北京 100080 
赵宇波 中国科学院深圳先进技术研究院 深圳 518055 
蔡小川 中国科学院深圳先进技术研究院 深圳 518055;美国科罗拉罗大学博尔德分校 博尔德 CO 80309 
摘要点击次数: 2241
全文下载次数: 3201
中文摘要:
      当前世界上排前几位的超级计算机都基于大量CPU和GPU组合的混合架构,它们对某些特殊问题,譬如基于FFT的图像处理或N体颗粒计算等领域可获得很高的性能. 但是对由有限差分(或基于网格的有限元)离散的偏微分方程问题,于CPU/GPU 集群上获得较好的性能仍然是一种挑战. 本文提出并测试一种基于这类集群架构的混合算法. 算法的可扩展性通过区域分解算法实现,而GPU的性能由基于光滑聚集的代数多重网格法获得,避免了在GPU上表现不理想的不完全分解算法. 本文的数值实验采用29 CPU/GPU求解用差分离散后达三千学报
英文摘要:
      Several of the top ranked supercomputers are based on the hybrid architecture consisting of a large number of CPUs and GPUs. High performance has been obtained for problems with special structures, such as FFT-based imaging processing or N-body based particle calculations. However, for the class of problems described by partial differential equations (PDEs) discretized by finite difference (or other mesh based methods such as finite element) methods, obtaining even reasonably good performance on a CPU/GPU cluster is still a challenge. In this paper, we propose and test an hybrid algorithm which matches the architecture of the cluster. The scalability of the approach is implemented by a domain decomposition method, and the GPU performance is realized by using a smoothed aggregation based algebraic multigrid method. Incomplete factorization, which performs beautifully on CPU but poorly on GPU, is completely avoided in the approach. Numerical experiments are carried out by using up to 32 CPU/GPUs for solving PDE problems discretized by FDM with up to 32 millions unknowns.
查看全文  查看/发表评论  下载PDF阅读器
关闭
微信关注二维码 用微信扫一扫

美女

美女图片

美女

美女图片