A New Distributed Machine Learning Library Developed Based on LOGO Computing Framework
Author:
Affiliation:

1.Shenzhen University,Shenzhen;2.Guangdong Laboratory of Artificial Intelligence and Digital Economy SZ,Shenzhen

Funding:

Key Basic Research Foundation of Shenzhen (JCYJ20220818100205012), Natural Science Foundation of Guangdong Province (2023A1515011667), Basic Research Foundations of Shenzhen (JCYJ20210324093609026)

Ethical statement:

  • Article
  • |
  • Figures
  • |
  • Metrics
  • |
  • Reference
  • |
  • Related
  • |
  • Cited by
  • |
  • Materials
    Abstract:

    LOGO is a new distributed computing framework using a Non-MapReduce computing paradigm. Under the LOGO framework, big data distributed computing is completed in two steps. The LO operation runs a serial algorithm in a number of nodes or virtual machines to process independently the random sample data blocks, generating local results. The GO operation uploads all local results to the master node and integrate them to obtain the approximate result of the big data set. The LOGO computing framework eliminates data communication between nodes during iterations of the algorithm, greatly improving computing efficiency, reducing memory requirements, and enhancing data scalability. This article proposes a new distributed machine learning algorithm library RSP-LOGOML under the LOGO computing framework. A new distributed computing is divided into two parts: the serial algorithm executed by the LO operation and the ensemble algorithm executed in the GO operation. The LO operation can directly execute existing serial machine learning algorithms without the need to rewrite them according to MapReduce. The GO operation executes ensemble algorithms of different kinds depending on the ensemble tasks. In this article, the principle of LOGO distributed computing is introduced first, followed by the algorithm library structure, the method for packaging existing serial algorithms and the ensemble strategy. Finally, implementation in Spark, App development, and the results of performance tests for various algorithms are demonstrated.

    Reference
    Related
    Cited by
Get Citation
Share
Article Metrics
  • Abstract:
  • PDF:
  • HTML:
History
  • Received:February 24,2024
  • Revised:February 24,2024
  • Adopted:
  • Online: May 20,2024
  • Published: