跨数据中心大数据分布式计算研究与展望

孙旭东; 王圳炯; 何玉林; 尹剑飞; 黄哲学

doi:10.12146/j.issn.2095-3135.20260326001

跨数据中心大数据分布式计算研究与展望

Research and Prospect of Cross-Data Center Big Data Distributed Computing

摘要

摘要: 随着数据量呈指数级增长，集中式数据分析模式已难以适配跨数据中心分布式存储场景下的大数据分析与建模需求。传统MapReduce分布式计算模式存在灵活性不足和效率受限等问题，已成为新型计算架构迭代升级的核心瓶颈。本文系统总结了近年来非MapReduce分布式计算范式的研究成果，重点梳理了基于随机样本划分的分布式数据表示模型和基于随机样本划分数据块抽样的近似计算方法，以及依托该数据模型构建的LOGO计算框架与算法库的技术特征。最后，结合算力网络“高效协同、跨域调度”的发展趋势，展望非MapReduce范式在多场景大数据计算中的应用前景，并探讨其在算力互联互通和异构资源适配等方面面临的技术挑战，为新型分布式计算架构优化提供参考。

Abstract: With the exponential growth of data volume, the centralized data analysis model has been increasingly unable to meet the requirements of big data analysis and modeling in distributed storage scenarios across multiple data centers. The traditional MapReduce distributed computing model has become a core bottleneck for new computing architectures due to limitations such as insufficient flexibility and constrained efficiency. This paper systematically summarizes the recent research advances in Non-MapReduce distributed computing paradigms, focusing on the technical characteristics of the random sample partition distributed data representation model, the approximate computing method based on random sample partition data block sampling, as well as the LOGO computing framework and algorithm library built on the random sample partition data model. Finally, combined with the development trend of computing power networks featuring “efficient collaboration and cross-domain scheduling”, this paper prospects the application of Non-MapReduce paradigm in multi-scenario big data computing and discusses the technical challenges in aspects such as computing power interconnection and heterogeneous resource adaptation, providing references for the optimization of new distributed computing architectures.

HTML全文

参考文献(30)

施引文献

资源附件(0)