高级检索

跨数据中心大数据分布式计算研究与展望

Research and Prospects on Distributed Big Data Computing Across Data Centers

  • 摘要: 随着数据量呈指数级增长,集中式数据分析模式已难以适配多数据中心分布式存储场景的大数据分析与建模需求,传统MapReduce分布式计算模式因灵活性不足、效率受限等问题成为新型计算架构的核心瓶颈。本文系统总结近年来Non-MapReduce分布式计算范式的研究成果,重点梳理RSP分布式数据表示模型、基于RSP数据块抽样的近似计算方法,以及依托该数据模型构建的LOGO计算框架与算法库的技术特征。最后,结合算力网络“高效协同、跨域调度”的发展趋势,展望Non-MapReduce范式在多场景大数据计算中的应用前景,并探讨其在算力互联互通、异构资源适配等方面面临的技术挑战,为新型分布式计算架构优化提供参考。

     

    Abstract: With the exponential growth of data volume, the centralized data analysis model has been increasingly unable to meet the requirements of big data analysis and modeling in distributed storage scenarios across multiple data centers. The traditional MapReduce distributed computing model has become a core bottleneck for new computing architectures due to its limitations such as insufficient flexibility and constrained efficiency. This paper systematically summarizes the recent research advances in Non-MapReduce distributed computing paradigms, focusing on the technical characteristics of the RSP distributed data representation model, the approximate computing method based on RSP data block sampling, as well as the LOGO computing framework and algorithm library built on the RSP data model. Finally, combined with the development trend of computing power networks featuring "efficient collaboration and cross-domain scheduling", this paper prospects the application prospects of Non-MapReduce paradigm in multi-scenario big data computing and discusses the technical challenges in aspects such as computing power interconnection and heterogeneous resource adaptation, providing references for the optimization of new distributed computing architectures.

     

/

返回文章
返回