Abstract:
With the exponential growth of data volume, the centralized data analysis model has been increasingly unable to meet the requirements of big data analysis and modeling in distributed storage scenarios across multiple data centers. The traditional MapReduce distributed computing model has become a core bottleneck for new computing architectures due to its limitations such as insufficient flexibility and constrained efficiency. This paper systematically summarizes the recent research advances in Non-MapReduce distributed computing paradigms, focusing on the technical characteristics of the RSP distributed data representation model, the approximate computing method based on RSP data block sampling, as well as the LOGO computing framework and algorithm library built on the RSP data model. Finally, combined with the development trend of computing power networks featuring "efficient collaboration and cross-domain scheduling", this paper prospects the application prospects of Non-MapReduce paradigm in multi-scenario big data computing and discusses the technical challenges in aspects such as computing power interconnection and heterogeneous resource adaptation, providing references for the optimization of new distributed computing architectures.